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Abstract 

Wc discuss importance sampling schemes for the estimation of finite time exit probabilities 
Oh . of small noise diffusions that involve escape from an equilibrium. A factor that complicates 

^ c*j the analysis is that rest points are included in the domain of interest. We build importance 

sampling schemes with provably good performance both pre-asymptotically, i.e., for fixed 
size of the noise, and asymptotically, i.e., as the size of the noise goes to zero, and that do 
not degrade as the time horizon gets large. Simulation studies demonstrate the theoretical 
results. 



> 

1 Introduction 

in 

| This paper considers the use of importance sampling for estimating hitting or exit probabilities 

' for stochastic processes. The process model is a d-dimensional diffusion X s = {X s (s), s 6 [0, oo)} 

■ satisfying the stochastic differential equation (SDE) 

m _ 

dX £ (s) = b(X £ (s))ds + ^a(X £ (s))dB(s), X £ (0) = x, (1.1) 

where e > and B(s) is a standard (i-dimensional Wiener process. Of particular interest is the 
$_i , case of gradient flows, b(x) = —DV(x), and constant diffusion coefficient, though many aspects 

of the analysis are more generally applicable. Let T> C M. d be an open set and denote by t £ 
the exit time of X £ (s) from T>. We are concerned with the estimation of quantities such as the 
probability that X £ leaves T> before some time T S (0,oo), or that it exits through a particular 
subset O C V before T, and related expected values. The principal novel feature of this work is 
that the initial point is in the neighborhood of an equilibrium point of the noiseless dynamics. 

The estimation of such probabilities has several mathematical and computational difficulties. 
It is related to the estimation of transition probabilities between different metastable states 
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within a given time horizon. As is well known, standard Monte Carlo sampling techniques lead 
to exponentially large relative errors as the noise coefficient e tends to zero. When rest points 
are in the domain of interest the situation is even more complicated than usual. This work will 
focus on this particularly difficult issue. 

The performance of unbiased estimators for rare event problems is usually measured by the 
size of the second moment of the estimator based on a single simulation. For a well designed 
scheme the ratio of this second moment to the quantity of interest will not grow too rapidly 
as e 4- 0. One measure is the exponential rate of decay of the second moment. If this rate of 
decay is exactly twice the decay rate for the probability of interest, then the scheme is called 
asymptotically efficient (or weakly efficient). The notion of strong efficiency requires that the 
ratio of the second moment to the square of the probability be bounded above uniformly for all 
small e > 0. While such performance is certainly desirable, it is not common when dealing with 
models such as (II. ip that involve state dependent dynamics and complicated geometries. As we 
describe below, in some sense both these measures are inadequate for the situation considered 
here. 

A theory based on subsolutions to an associated Hamilton- Jacob i-Bellman (HJB) equation 
has been developed for the design and performance analysis of importance sampling (see for 
example [6j [8l [5]). In this approach the change of measure (which for reasons made evident 
later on we will call the control) used in the importance sampling is defined in terms of the 
gradient of a subsolution, and the performance, as measured by the decay rate of the second 
moment, is given by the value of the subsolution at the initial location x. This theory is the 
starting point of our analysis of (jl.ip . though as mentioned previously the inclusion of rest points 
will motivate some further developments. 

For any particular class of process models and events, an essential step in the application of 
this approach is the construction of appropriate subsolutions. In this paper we will exploit the 
fact that the Freidlin-Wentsell quasipotential [10] can be used to construct various subsolutions 
for certain time independent problems related to (jl.ip . In addition, for particular but important 
classes of process models (e.g., gradient systems with constant diffusion matrix), the quasipo- 
tential and hence these subsolutions take explicit and simple forms. As we discuss in detail later 
on, these subsolutions also give subsolutions for the time dependent problems, and when T is 
large the value of the subsolution at the starting point (which now includes time t = as well 
as the location) will be close to the maximal value. 

It follows that if the final time T is large enough, then existing theory implies that the 
estimator based on this subsolution should have a nearly optimal decay rate for its second 
moment. While this is a valid statement, there is an important qualitative difference between 
problems which include a rest point in the domain of interest and those which do not. The 
distinction turns not on the decay rate, which behaves as expected in both situations, but 
rather depends on the pre-exponential terms not captured by the decay rate. When the domain 
does not contain a rest point one has simultaneously good rates of decay and control over the 
pre-exponential terms. However, when a rest point is present schemes based only on this time 
independent subsolution keep the desired decay rate but lose the good control over the pre- 
exponential terms. The qualitative difference is related to the fact that in the former case a 
subsolution designed on the basis of the e = problem can be shown to give useful bounds for 
the problem with e > 0, but in the latter case this is no longer true. This qualitative distinction 
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will be made precise later on, when we construct non-asymptotic bounds on the second moment 
for the two cases. When e > is small but not too small, the loss of performance due to the large 
pre-exponential term can be significant, rendering the associated importance sampling scheme 
little better than ordinary Monte Carlo. As e 4 the exponential decay rate dominates, and 
importance sampling once again gives much greater performance than ordinary Monte Carlo. 
However, the improvement is less than in the case where rest points are not included, and an 
approach which avoids this loss of performance would certainly be welcome. 

One can show that in fact there is no time independent subsolution that will resolve this 
difficulty, and so one must bring in some form of time dependence. It is not at all clear how 
one might incorporate time dependence throughout the entire domain [0, T] x D. An alternative 
approach, and the one we follow in this paper, is to combine an explicit solution to an approxi- 
mating time dependent problem in a neighborhood of the rest point with the time independent 
subsolution obtained via the quasipotential away from the rest point. As we will show, such an 
approach will maintain the high decay rate while at the same time properly controlling the pre- 
exponential term. In the neighborhood of the rest point, one can approximate the dynamics of 
the diffusion process by a Gauss-Markov process, i.e., a process with a constant diffusion matrix 
and drift that is afline in the state. For these dynamics and appropriate terminal conditions 
for the localizing problem, the solution to the related PDE can be constructed in terms to the 
famous linear/quadratic regulator problem from optimal control theory. As a consequence an 
explicit and nearly optimal scheme for a surrogate problem can be identified in the neighborhood 
of the rest point, which is then merged with the explicit scheme based on the quasipotential in 
that part of the domain where it is particularly effective. 

In this paper we analyze the difficulties caused by the presence of rest points in a general 
setting. We describe and theoretically justify a resolution of these difficulties in the case of 
dimension one, and present computational data for this case. The construction of the localizing 
problem is more elaborate in dimension greater than one, and will be presented in a companion 
paper along with the results of numerical experiments in higher dimensions. 

The contents of this paper are as follows. In Section [2] we review the relevant large deviation 
theory and importance sampling. In Section [3] we discuss the effectiveness of time independent 
subsolutions. In particular, we show that if rest points are not part of the domain of interest then 
subsolutions lead to both good decay rates and non-asymptotic bounds for the second moment 
of the corresponding unbiased estimator. However, when rest points are included in the domain 
of interest, the situation is more complicated and even if the decay rate is good, the prelimit 
bounds may not be as good as desired. In Section [H we present a change of measure for the 
problem with a rest point for a quadratic potential function with provably good pre-asymptotic 
and asymptotic performance and which does not degrade as T gets larger. In Section[5]we extend 
the discussion to the nonlinear problem with rest points. Simulation data, demonstrating the 
discussions in Sections H] and [5] are also presented in the corresponding sections. The Appendix 
has some auxiliary lemmas that are used in the main body of the manuscript. 
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2 Related Large Deviation and Importance Sampling Results 

In this section we recall well known large deviation results for probabilities of exit times (Subsec- 
tion [27T]), review importance sampling in the context of small noise diffusions (Subsection I2.2p . 
and also recall the notion of subsolutions to certain related HJB equations (Subsection 12. 3p . 

In most of this paper the following assumptions will be used. The assumptions are stronger 
than necessary, but simplify the discussion considerably. For example, the non-degeneracy of 
the diffusion matrix and regularity of the boundary of D easily imply that a limit exists for 
(|2.2p . They can be weakened, but the existence of the limit then requires conditions that are 
best addressed in a problem dependent fashion. 

Condition 2.1 i. The drift b is bounded and Lipschitz continuous. 

ii. The coefficient a is bounded, Lipschitz continuous and uniformly nondegenerate. 

Hi. T> is an open and bounded subset of M. d , and at all points on its boundary T> satisfies an 
interior and exterior cone condition, i.e., there is 5 > such that if x G &D, then there 
exist unit vectors v i , v 2 G M rf such that 

{y : \\y — x\\ < 5 and \(y — x, v±)\ < 5 \\y — x\\} CD 

and 

{y : \\y — x\\ < 5 and \(y — x, V2)\ < 5\\y — x\\} HV = 0. 

We also assume that if x G dT> and if 4> solves (ft = b(cj)), 0(0) = x, then (p(t) G T> for all 
t G (0,oo). 



2.1 Large deviation results 

Fix T G (0, 00) and consider an initial point (t, x) G [0, T) x T>. Consider a bounded and class 
C 2 function h : M. d 1— > M. Let Ej iX denote expected value given X £ (t) = x, and define 

9%t,x) = E t , x [e-^ (X£(T£)) V< T} ] . (2.1) 

Since 9 s scales exponentially it is also useful to define 

G £ {t,x) = -eln9 £ {t,x). (2.2) 

Although for now we focus on the case where h is bounded and continuous, one is also interested 
in cases where h is discontinuous and takes the value 00. An example is when for some set 
O C dV, h(x) = 00 if x G" O and h(x) = if x G O. In this case 6 £ (t,x) equals the probability 
of exiting T> through O before time T. For these cases and under mild regularity conditions on 
O, statements analogous to Theorem 12.21 below hold. 

Let AC (It, T] : M. d ) be the set of absolutely continuous functions on [t,T] with values in W d . 
We denote the local rate function by 

L(x, v) = — (y — b(x), a~ 1 (.x) [v — b(x)]^ 
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where a(x) = a(x)a T (x), and the corresponding rate or action functional for cf> 6 AC : 
by 

l tT {4>) = £ L(4>(s),<p( s ))ds. 

For all other <p G C ([t, T] : R d ) set I tT (<p) = oo. 

The following large deviations result is well known, e.g., [9l I10j. 

Theorem 2.2 Assume Condition \2.1\ Then for each (t,x) G [0,T) x £> 

UmG%t,x) = G(t,x) = inf [J tT (</>) + /»(#T))] , 

where 

A(t,x) = {</> G C([t,T] : M d ) : = x,0(s) e V for s £ [t,T],(f)(T) G 92?} . 
2.2 Preliminaries on importance sampling 

We briefly review the use of importance sampling for estimating 9 e (t,x) for a given function h. 
Let r £ (t, x) be any unbiased estimator of e (t, x) that is defined on some probability space with 
probability measure P. Thus T £ (t,x) is a random variable such that 

EF £ (t,x) = 9 £ (t,x), 

where E is the expectation operator associated with P. In this paper we will consider only 
unbiased estimators. 

In Monte Carlo simulation, one generates a number of independent copies of T £ (t, x) and the 
estimate is the sample mean. The specific number of samples required depends on the desired 
accuracy, which is measured by the variance of the sample mean. However, since the samples 
are independent it suffices to consider the variance of a single sample. Because of unbiasedness, 
minimizing the variance is equivalent to minimizing the second moment. By Jensen's inequality 

E(r £ (t,x)) 2 > (tr £ (t,x)f = e £ (t, x ) 2 . 

It then follows from Theorem 12.21 that 

limsup-elogE(r e (t,2;)) 2 < 2G{t,x), 

and thus 2G(t, x) is the best possible rate of decay of the second moment. If 

liminf-elogE(r £ (t,x)) 2 > 2G(t,x), 

E->0 

then T £ (t,x) achieves this best decay rate, and is said to be asymptotically optimal. While 
asymptotic optimality or near asymptotic optimality is desirable, as noted in the introduction 
one may also desire good behavior of the pre-exponential term. To keep the terminology clear 
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we will avoid the conventional usage of terms such as asymptotic optimality, and refer instead 
to properties of the "decay rate" and the "pre-exponential term." 

The unbiased estimators T £ (t,x) that we consider are all based on measure transformation. 
Consider u £ (s), a sufficiently integrable and adapted function, such that 

§ - exp Kf h " {a)fi " + 7i[ <" E M' dB( *»} 

defines a family of probability measures P e . Then by Girsanov's Theorem, for each e > 

B(s) = B(s) - 4= / u £ (p)dp, t<s<T 

is a Brownian motion on [t, T] under the probability measure P e , and X £ satisfies X £ (t) = x and 

dX £ (s) = b {X £ (s)) ds + a (X £ (s)) [^dB(s) + u £ (s)ds] . 

For our purposes, u £ (s) is either given as a process that is progressively measurable with respect 
to a suitable filtration that measures the Wiener process (sometimes called an open loop control) , 
or else it is of feedback form, in which case there is a suitably measurable function u £ : [0, T] x 
W 1 — > W 1 such that u £ (s) = u £ (s,X £ (s)). Of course when implementing importance sampling 
we consider only the latter form. Letting 

f 1 1 d¥ 

r(t,x)=exp|--MX £ (r £ ))|v< T} ^(X £ ), 

it follows easily that under P e , T £ (t,x) is an unbiased estimator for 9 £ (t,x). The performance 
of this estimator is characterized by its second moment 

exp{-\(^(r £ ))}v<r } (^(^)) 2 • 

The goal of this paper is to investigate the effect of rest points on Q £ (t, x; u £ ) and how one can 
choose controls that guarantee both good decay rates and pre-exponential bounds for Q £ (t, x; u £ ). 

We conclude this section with a review of subsolutions to related HJB equations. Such 
subsolutions are essential for constructing and analyzing good important sampling schemes. 

2.3 Subsolutions to a related PDE 

Let 

1 2 

M(x,p) = (b(x),p) - -\\a T (x)p\\ . 

The construction of good importance sampling schemes for a quantity such as (|2.ip is closely 
related to the HJB equation 

U t (t, x) + H(x, DU(t, x)) = for (f, x) £ [0, T) x V, (2.3) 

U(t, x) = h(x) for t < T, x G &D, U(T, x) = oo for x G V, (2.4) 

and more precisely to its subsolutions. It can be shown that G defined in Theorem 12.21 is the 
unique continuous viscosity solution of (|2.3p and (|2.4[) . see [9]. 



Q £ {t,x-u £ ) =E £ 
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Definition 2.3 A function U(t, x) : [0, T] x M. d \— > R is a classical subsolution to the HJB 
equation \2. 3\) and {2.$ if 

i. U is continuously differentiable, 

ii. U t {t,x) + W(x,DU(t,x)) > for every (t,x) G [0,T) x V, 

Hi. U(t, x) < h(x) for t <T,x £ <92? and £7(T, x) < oo /or x £ T>. 

The connection between subsolutions and the performance of importance sampling schemes 
has been established in several papers, such as [5]. These papers either consider classical 
subsolutions or, more generally, piecewise classical subsolutions. To simplify the discussion, we 
consider here just classical subsolutions. In the present setting, we have the following theorem 
regarding asymptotic optimality (Theorem 4.1 in [5]). 

Theorem 2.4 Let {X £ ,e > 0} be the unique strong solution to Consider a bounded 

and continuous function h : R i— > R and assume Condition \2.1\ Let U (t, x) be a subsolution 
according to Definition \2.3\ and define the control u e (s) = —a T (X e (s))DU(s,X e (s)). Then 

liminf-elog(3 e (t,x;M e ) > G(t,x) + U(t,x). 

£-S>0 

Since U is a subsolution it is automatic that G(s,y) > U(s,y) for all (s,y) € [0,T] x D. If 
G(t, x) = U(t, x) then the scheme has the largest possible decay rate. 

3 Qualitative properties of schemes based on subsolutions 

In this section we justify some of the claims made in the introduction regarding the differences in 
performance between importance sampling schemes when rest points are included in the domain 
of interest and when they are not. We consider just the problem of estimating the probability 
of escape from a set before time T, and even then consider a particular setup. However, the 
example will illustrate the difference between the two cases, and also suggest how one might 
improve the performance of importance sampling when rest points are involved. 

Remark 3.1 Much of the prior application of subsolutions to importance sampling [21 SI E] 
has involved the estimation of escape probabilities for classes of stochastic networks, in which 
case the origin is often the unique stable point for the law of large numbers dynamics [the 
analogue of (jl.ip with e = 0]. The event most often studied in this context is that of escape 
from a set (i.e., buffer overflow) before reaching the origin, after starting near but not at the 
origin. The analogous event for the diffusion model (jl.ip is one of the problems that are the 
focus of the present work. However, the difficulties that will be described momentarily for the 
diffusion model do not arise when dealing with the analogous estimation problem for stochastic 
networks, and indeed in that setting the proximity of the rest point has little impact on either 
the rate of decay or the pre-exponential term. This is related to the fact that the law of large 
numbers trajectories for stochastic networks reach the origin in finite time, as opposed to the 
infinite time it takes for the solution to (jl.ip with e = to reach a stable equilibrium point when 
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Figure 1: Escape problem with no rest point 



not starting at such a point. In turn, this property is responsible for the fact that minimizing 
trajectories in the definition of the quasipotential are achieved on bounded time intervals for 
stochastic network models, but take infinite time for processes such as (jl.ip . 

For the remainder of this section we concentrate on the special case of b{x) = —DV(x) and 
u{x) = /, and on a particular estimation problem. We first argue that if the domain of interest 
does not include a rest point, then given a time-independent subsolution and associated control 
not only is a good decay rate obtained, but good bounds on the pre-exponential terms hold as 
well. We then show why this is not possible when a rest point is included. 

Assume that x = is the global minimum for V(x), so that DV(0) = 0, and that DV(x) ^ 
for all x ytz 0. Without loss we assume that V(0) = 0. Let < I < L and define D = {i £ l d : 
£ < V(x) < L} and A c = {x G M. d : V(x) = c}. Then the problem is to estimate 

8 e (t, y) = F t , y {X s hits A L before hitting A e and before time T} , 

where the initial point y is such that t < V(y) < L. This corresponds to f)2. 1 [) . but here h is 
not bounded and smooth, and instead h(x) = if x £ Al and h(x) = oo if x £ Ag. For this 
problem one can also identify the rate of decay G(t,y) via (|2.2p . A one-dimensional example is 
illustrated in Figure CD 

The quasipotential with respect to the equilibrium point is defined by 

5(0, x) = inf {ior(0) : <t> G C([0, T] : R d ), 0(0) = 0, <j>(T) = x, T G (0, oo)}. 

It follows from the variational characterization of S that x — > 5(0, x) is always a weak sense 
solution to EI (x, —DS(0, x)) = 0, and therefore by adding an appropriate constant C to satisfy 
any needed boundary and terminal conditions, —5(0, a:) + C will always define a weak sense 
subsolution. In the present case 5(0, x) takes the explicit form (Theorem 4.3.1 in |10| ) 

5(0, x) = 2 (V{x) - V(0)) = 2V(x), 

and it is easy to check that U(x) = —2(V(x) — L) is a subsolution according to Definition 12.31 
Indeed, U(x) = for x £ Al, while the boundary condition U(x) < oo for x G A# and terminal 
condition U(x) < oo for x G T> hold vacuously. See Figure [2j The control (i.e., change of 
measure) suggested by this subsolution for the importance sampling scheme is u(x) = 2DV(x). 
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The following representation for Q £ (t,x;u) follows essentially from the arguments of Sub- 
section 2.3 of [Sj, which is based on the representation for exponential integrals with respect to 
Brownian motion given in pQ. The representation is given in terms of the value of a stochas- 
tic differential game, where the player corresponding to the importance sampling scheme has 
already selected their control (i.e., u). The characterization of performance for importance sam- 
pling in terms of games was first introduced in [6]. The only difference between the use here and 
in [5] is that there the function h is bounded, which is not true here since we consider an escape 
probability. However, the bounds stated below can be obtained by first replacing ool| r> ^} by 
M1{ T>T } and then letting M f oo. 

Let fa be a filtration satisfying the usual conditions of completion and right continuity and 
which measures the Wiener process. Let A denote the set of all ^-progressively measurable 
(i-dimensional processes v = {v(s),0 < s < T} that satisfy 



E 



Jo 



\v(t)\\ 2 dt < oo. 



Let e > be fixed and let X s be the unique strong solution to 



dX e {s) = -DV(X £ (s))ds + VedB(s) - [u(X £ (s)) - v(s)}ds 



with initial condition X s (0) = y. Let f £ denote the first time X s exits T>. Then 



elogQ e (0,y;u) = inf E 



\ r \\v(s)\\ 2 d S - 

1 Jo JO 



u(X £ (s)) 



ds + ool|f £> T} 



(3.1) 



It is important to note that (|3.ip provides a non-asymptotic representation for the per- 
formance measure Q £ (0,y;u). However, to obtain a more concrete statement regarding the 
performance of the importance sampling scheme, we will want bounds on Q £ (0,y;u) that are 
more explicit than the right hand side of (13.1j) . We do this by observing that when viewed as a 
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function of an arbitrary starting point (t,x), — e log Q £ (t,x;u) also satisfies a nonlinear PDE of 
the same general form as (I2.3P [plus terminal and boundary conditions], and thus lower bounds 
can be obtained by constructing subsolutions for this PDE. However, a key difference is that in 
contrast to (|2.3p . the PDE for (|3.ip involves a second derivative term. One cannot avoid this 
issue, in that second derivative information and e dependence are needed if one is to obtain 
non-asymptotic bounds, even when the change of measure is based on a first order equation. 

We next give the statement of the lower bound as it applies to the special case of this section. 
A more general statement and the proof will be given in Lemma 16. 11 However, the proof is an 
easy consequence of Ito's formula and the min/max representation 



Define 



M(x,p) = inf sup 

v u 

G £ [W](t,x 



(p, —DV(x) — u + v) 



- tt H — 
2 M II t- 4 



W t (t, x) + H(x, DW(t, x)) + -D 2 W{t, x) 



and let TV be a subsolution to £/ e [VF] = together with the boundary conditions W(t, x) = for 
t < T,x G Al, W(t, x) = oo for t < T,x G Ag, and terminal condition W(T, x) = oo for cc G P. 
Suppose u is the control based on a given smooth function U, i.e., u(t,x) = —DU(t,x). Then 



elog Q £ (0,y;u) 

inf E 

v<=A:t e <T w.p.l 

>2W(0,y) 



(3.2) 



+ inf E 

veA:f e <T w.p.l 



Next we show how 
define 



|f(s)|| 2 ds 



2g £ [W](s,X £ {s))ds 







2 


r 


u(s,X E (s)) 


ds 


Jo 







DW(s,X £ (s)) - DU(s,X £ (s)) 



ds 



can be used to obtain bounds that are uniform in T. For r) G (0, 1) 

U^(x) = (l-7])U(x), 



where U is the subsolution based on the quasipotential for V as above, and assume that U 
is twice continuously differ entiable. Then as with U the appropriate boundary and terminal 
inequalities hold for U* 1 . We next evaluate the right side of (|3.2j) when W = XJ ri and U = U . A 
straightforward calculation gives 

H(x, DU^ix)) = 2(t] - i] 2 ) || W(x)|| 2 , 



and therefore 

G £ [U n ]{x) - - \\DU^{x) - DU(x)\\ z = 2(7? - 2 V 2 ) \\DV{x)\\* - e(l - r])D 2 V(x). 



1 



2 119 1 1 2 

For e > but smaller than a constant that depends on inf xe £> ||DV(a;)|| and sup xg -p \\D V(x)\\ , 
there is rj = r](e) with 77(e) — > as e — > such that the last display is non-negative. We then 
obtain from (|3.2p the non-asymptotic upper bound 



Q £ (0,y;u) <e~^ 



Hl-ri)U(y) 
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Figure 3: Escape problem with a rest point 



Note that this bound is independent of T, and also that this argument is not possible when 
£ T>. Indeed, since H(0,p) = — ||p|| 2 /2 < for all p, G £ [U TI ](x) > is not possible for any 
choice of r] < 1 when £ T>. 

The quality of the bound obtained by this method depends on the degree to which the 
subsolution obtained for the PDE Q e [W] = (plus boundary and terminal conditions) accurately 
approximates the solution to this equation. In this example, we have used quite crude method to 
produce such a subsolution, which is to simply reduce a given subsolution to the e = equation 
by a constant factor of (1 — 7/). An examination of the calculations suggest that the bound 
is not at all tight, which turns out to be true. In fact, in this situation we can construct a 
better subsolution and hence a tighter bound. For example, so long as the origin is not included 
x 2 — L+£log(x/VL) can be used to obtain tighter bounds, though this function is not convenient 
for the time dependent problem. 

Note that the two functions U and U v play very different roles here. One is used to design 
an importance sampling scheme [here U], and one used for its analysis [here U^]. Indeed, U v 
with T) > is used only for the analysis of the scheme that corresponds to U and in particular 
to derive a bound that is independent of T. However, the design of the scheme and thus the 
simulation algorithm use the control u(t,x) = —DU(t,x). 

Next we consider the behavior of Q £ (0, y;u) when £ T>. In this case, we claim that 
Q £ (0, y; u) grows without bound in T for all e > 0, and therefore the performance of the control 
based on the quasipotential degrades as T becomes large. To show this is true, we use the 
game representation to establish a lower bound on Q e (0,y]u). We again examine a particular 
situation, which is to estimate the probability of escape from V = {x £ R d : V(x) < L} before 
time T, after starting at y at time 0. See Figure [3) The subsolution is still that of Figure [2J 

With the understanding that f e now represents the time of escape of X £ from {x £ M. d : 
V{x) < L}, the representation (|3.ip is still valid. Suppose that T is large and note that, while 
u{x) = —DU(x) = 2DV(x) destabilizes the origin when used to construct the measure used for 
importance sampling, in the representation (|3.ip it actually increases the stability of the origin, 
in the sense that —DV(x) — u{x) = — 3DV(x). As a consequence, it is easy to construct a 
control v which shows poor performance as T — > oo. The construction is suggested in Figured! 
With T large we divide [0, T] into an initial part [0, T — K) and a final part [T — K,T], with K 
fixed. During the first part we apply v(t) = 0. Because the resulting dynamics of X s are stable 
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X 



V 


v(t) = 


v(t) » 4W(X £ (t)) : 

























Figure 4: Construction of v 



about the origin, with very high probability the process settles around for the entire interval 
[0,T — K). In the game representation there is then a running cost of ||m(X £ (s))|| 2 , which one 
can check is of order e > 0. In the second portion we apply a control which leads to escape prior 
to time T, with a cost that may depend on K but is independent of T. An example of such a 
control, at least away from the origin, is as illustrated in Figure HI The precise details of the 
construction in this second part are not important. All that is needed is that such a control 
exists, which can easily be demonstrated. 

When the two parts are combined, we have a control that provides an upper bound of the 
form 

-elogQ e (0,y;n) < -ed[T - K] + C 2 , 
where C\ and C2 are positive constants. This shows that 

Q £ (0,y,u)>e c ^ T - K k-^\ (3.3) 

and thus for fixed e > and large T the term we have called the pre-exponential term dominates 
and the scheme is very far from optimal. We find that in this situation there are two exponential 
scalings, one in the noise strength and one in the length of the time interval, and the issue of 
which dominates depends on their relative sizes. 

These effects are reflected in computational data. In Tables [T] and [2] we present both estimated 
values and relative errors for the problem of escape from an interval of the form [—.A, A], with 
A = 1. The process is a one dimensional Gauss-Markov model with drift — cx and diffusion 
coefficient ^feo [see (j4.1j) ]. with c = a = 1. In the tables, values of T appear at the top 
and values of e along the left side. Each computed value is based on 10 7 samples. A dash 
indicates that no samples escaped. To ease the presentation the relative errors are rounded to 
the nearest integer. Owing to the fact that the subsolution based on the quasipotential is far 
from optimal in any sense when T is small, the relative errors are large for small T and decrease 
until approximately T = 2. For larger T the errors grow rapidly with T. Note that the estimated 
relative errors in Table [2] are not necessarily accurate for large T, since they are subject to the 
same errors that can affect the probability estimates, but do indicate the qualitative worsening 
of estimation accuracy. 
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e\T 


0.25 


0.5 


1 


1.5 


2.5 


10 


14 


18 


0.20 


9.8e- 07 


2.0e - 04 


3.3e- 03 


9.0e-03 


9.1e-03 


1.2e - 01 


1.7e-01 


2.0e-01 


0.16 


3.6e- 08 


2.5e - 05 


7.3e - 04 


2.2e-03 


6.6e-03 


4.0e- 02 


5.7e-02 


6.4e - 02 


0.13 


9.8e- 10 


2.3e - 06 


1.3e- 04 


5.1e-04 


1.6e-03 


Lie - 02 


1.5e - 02 


2.7e- 02 


0.11 


4.7e- 11 


2.4e - 07 


2.4e - 05 


Lie -04 


3.9e - 04 


2.8e- 03 


4.0e - 03 


6.0e-03 


0.09 




8.8e - 09 


2.1e-06 


1.2e-05 


5.2e - 05 


4.0e - 04 


5.8e-04 


8.3e - 04 


0.07 




5.5e - 11 


5.0e - 08 


4.3e- 07 


2.2e - 06 


1.9e- 05 


2.8e-05 


3.7e- 05 


0.05 




5.6e- 15 


5.9e- 11 


9.7e- 10 


6.9e-09 


7.1e- 08 


Lie - 07 


1.3e- 07 



Table 1: Using the subsolution based on quasipotential throughout. Estimated values for differ- 
ent pairs (e,T), two sided problem. 



e\T 


0.25 


0.5 


1 


1.5 


2.5 


10 


14 


18 


0.20 


91 


7 


2 


1 


1 


10 


51 


179 


0.16 


253 


10 


2 


1 


1 


10 


48 


139 


0.13 


748 


16 


3 


1 


1 


9 


48 


378 


0.11 


1594 


26 


3 


1 


1 


10 


42 


272 


0.09 




49 


4 


2 


1 


9 


43 


357 


0.07 




127 


5 


2 


1 


8 


47 


251 


0.05 




714 


8 


2 


1 


8 


42 


145 



Table 2: Using the subsolution based on quasipotential throughout. Relative errors per sample 
for different pairs (e,T), two sided problem. 

Tables [3] and 0] present the approximated values and relative errors for the problem with the 
domain (— oo, A] and escape possible therefore only at A. The results are of the same qualitative 
form as before, and carried out only to T = 10. 

It is useful to compare the two situations, and identify why uniform control of pre-exponential 
terms was not possible when G T>. In both cases the control was based on the quasipotential, 
which is a valid subsolution to the e = problem. When using Ito's formula to bound the 
second moment of the estimator, we must of course deal with the second derivative term, which 
is multiplied by e > 0. It can happen that this term has a sign that degrades (increases) the 
second moment, and indeed this is always true in a neighborhood of the origin (this is essentially 
due to the convexity of V near the origin). For the case where ^ T> and for sufficiently small 
e > 0, this could be balanced by using that when i/O u(x) and therefore DU{x) are nonzero. 
However, this is not possible when the rest point is included in the domain of interest. Indeed, 
the running cost that is accumulated in the construction leading to (|3.3I) corresponds to this 
term, and as that argument shows it cannot be removed. The construction also suggests how 
the large variance comes about, which is that some trajectories generated under the change of 
measure defined by u remain in a neighborhood of the origin for a long time, in spite of the fact 
that with such dynamics the origin is an unstable equilibrium point. The likelihood ratios along 
these trajectories can vary greatly and, even though they are themselves relatively unlikely, they 
are likely enough to increase the variance of the estimator to the point where it will become 
worse than standard Monte Carlo. As such, they are reminiscent of the "rogue" trajectories 
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e\T 


0.25 
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1.5 


2.5 


7 


10 


0.20 


4.6e-07 


l.Oe - 04 


1.7e- 03 


4.5e - 03 


l.le-02 


4.2e-02 


6.2e- 02 


0.16 


2.1e-08 


1.3e - 05 


3.7e - 04 


1.2e-03 


3.3e-03 


1.3e-02 


2.0e - 02 


0.13 


2.7e- 10 


Lie -06 


6.5e- 05 


2.5e-04 


7.9e- 04 


3.5e-03 


5.3e- 03 


0.11 


lAe - 11 


1.2e - 07 


1.2e- 05 


5.7e - 05 


2.0e- 04 


9.2e-04 


1.4e- 03 


0.09 




4.3e - 09 


l.le-06 


6.5e-06 


2.6e- 06 


1.3e-04 


2.0e - 04 


0.07 




2.4e - 11 


2.5e- 08 


2.2e-07 


l.le-06 


6.1e-06 


9.3e - 06 


0.05 




1.7e- 15 


3.0e - 12 


4.9e- 10 


3.5e- 09 


2.2e-08 


3.5e- 08 



Table 3: Using the subsolution based on quasipotential throughout. Estimated values for differ- 
ent pairs (e,T), one sided problem. 



e\T 


0.25 


0.5 


1 


1.5 


2.5 


7 


10 


0.20 


132 


10 


3 


2 


2 


5 


15 


0.16 


331 


15 


3 


2 


2 


4 


14 


0.13 


1418 


23 


4 


2 


2 


4 


14 


0.11 


3162 


36 


4 


2 


2 


4 


14 


0.09 




70 


6 


3 


2 


4 


13 


0.07 




194 


7 


3 


2 


4 


12 


0.05 




1300 


12 


4 


2 


4 


12 



Table 4: Using the subsolution based on quasipotential throughout. Relative errors per sample 
for different pairs (e,T), one sided problem. 

which lead to poor performance of non-dynamic forms of importance sampling as discussed in 

p on ng. 

It will turn out that to overcome the difficulties introduced by the rest point one must do a 
much better job of approximating the optimal change of measure than is possible based just on 
a time and e-independent subsolution. However, it also turns out that the additional accuracy 
is needed only near the rest point, where in fact explicit time and e-dependent solutions can 
be found. These are then combined with the simple time-independent subsolution based on 
the quasipotential to produce schemes that are nearly optimal and which protect against both 
sources of significant variance. An overview of the construction of such schemes is the topic of 
the next section. 

4 Combining subsolutions with a refined local analysis: the lin- 
ear problem 

In this section we combine a local analysis that produces a time and e-dependent scheme near the 
rest point with a scheme based on the quasipotential elsewhere. There are of course few process 
models and problems for which the related HJB equation can be solved explicitly. However, 
a class of processes where this is possible are the Gauss-Markov models, i.e., SDEs with drift 
that is linear in the state and constant diffusion matrix. For these processes and for terminal 
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conditions of the appropriate form, both the limit (e = 0) PDE and the prelimit (e > 0) PDE 
have an explicit solution that can be expressed in terms of the value function of a linear-quadratic 
regulator (LQR) control problem. 

Our ultimate approach to the construction of importance sampling schemes is suggested by 
Figure [5j The problem of interest is of the form (|2.ip or an analogous problem involving escape 
from T> prior to T. The particular problem will fix the boundary and terminal conditions on 
[0, T] x dT> and {T} x T>. In the figure, we are interesting in estimating the exit probability 

e e = P 0i0 {X s hits - Ax or A 2 before time T} , 

where is the rest point and T> = {—Ax, A2) with A\,A 2 > 0. 

In most of the domain, which is the section outside the curves that terminate at ±2 and after 
time T — t* , the control is based on a subsolution U constructed in terms of the quasipotential. 
Within the curves the solution to (II. ip is well approximated by a Gauss-Markov process. Hence 
within this region we will use a control that would be appropriate for a problem if the process 
were instead the approximating Gauss-Markov model. The function H that defines the PDE 
for this region is therefore the one corresponding to the Gauss-Markov process. Besides the 
dynamics in this region (which are determined by the Gauss-Markov approximation), we must 
choose a terminal condition. This will be given by the minimum of two quadratic functions, one 
centered at x and one centered at — x. The parameter x is a nominal value that for purposes 
of the present discussion can be taken to be 1. The parameter t* plays two roles. One is to 
determine the size of the region on which the true dynamics are approximated by Gauss-Markov 
dynamics. The second role is related to the fact that if the control based on the quadratics 
centered at ±x is used all the way to T then singularities will develop. For this reason, we 
switch back to the control based on the quasipotential after T — t*, and t* must be chosen so 
that the subsolution property is preserved across the handoff time T — t*. 

While the discussion above suggests the correct decomposition of the domain, the actual 
construction is more complex, since the control should transition nicely when moving between 
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the regions, and the construction of a scheme for which rigorous bounds can be proved will 
require additional mollification and approximations. However, the building blocks are always 
subsolutions to the indicated PDEs. The form of the surrogate problem for the Gauss-Markov 
model needs to be explained, as well as various boundary and terminal conditions. To simplify 
the discussion, we consider a sequence of successively more general problems. In this paper we 
complete the analysis for the one-dimensional problem. The multi-dimensional problem will be 
addressed in a companion paper. 



4.1 A one-dimensional Gauss-Markov model 

Our first goal is to construct a subsolution for the e = problem for the processes that will be 
used in the localization. This can be related to the problem of estimating the probability that 
the solution to 

dX £ (s) = -cX £ (s)ds + y/sadB{s), X £ {t) = x e {-A, A) 

escapes from the interval [—A, A] before time T, a problem that was also used for the computa- 
tional examples of the last subsection. The parameters c and a are positive constants. In this 
subsection we will take x = A, and because of this can postpone the issue regarding singularities 
in the control at t = T. We thus also take t* to be zero, and will return to the role of t* and its 
selection for a general problem in the next subsection. The corresponding PDE for the escape 
probability is 

Uf + M(x,DU £ ) + e -o 2 D 2 XJ £ = 0, M(x,p) = -cxp-^a 2 p 2 , 



plus the terminal and boundary conditions 

U £ {t,x) = 



x = ±A,t S [0,T], 
oo x G (—A, A),t = T. 

While simple in appearance, this equation does not have an explicit solution. 

The equation obtained in the limit e — > is more tractable, and the unique viscosity solution 
can be described as follows. U°(t,x) corresponds to the variational problem 

T 1 2 

s) + ccj)(s) ds : 4>(t) = x,\<fi(s)\ > A some s S [t,T] 



2 

Depending on how and when the minimizing trajectory leaves [—.A, A], the solution takes a 
particular explicit form. (In all cases the minimizer can be found by solving the appropriate 
Euler-Lagrange equation.) If for the initial condition (t, x) the minimizing trajectory leaves 
before time T, then 

U°(t,x) = F 1 (x) = 4 [A 2 -x 2 ] . 

This is the case when the minimal cost is the negative of the quasipotential, translated by a 
constant to satisfy the boundary condition at the exit location. Such initial conditions satisfy 
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F 2 (t, -x 



F l (x) 




-Ae^ Ae cl -'- T > 



Figure 6: The e = solution for a fixed t. 



\x\ > Ae c ^ T \ When < x < Ae c ^ T ) the minimizer leaves through A at exactly time T, and 
the minimizing value is 



U°(t,x) = F 2 (t,x) 



c (A - xe 



c{t-T)\2 



[l_ e 2c(t-T)j • 



G 



One can also interpret F 2 as the minimal cost for a linear quadratic regulator with a singular 
terminal cost applied at time T, i.e., a cost that equals at A and oo otherwise. 

By symmetry it is clear that when x < the minimizing trajectory will exit at —A. Define 



Ul(t,x) 

Setting U®(t,x) = U+(t, —x), we have 

U°(t,x) 

Note that when x = Ae c ( t ~ T * > 

c (A-xe c (*- T )) 2 



Fx(x) , if x > Ae^-^ 
F 2 (t,x) , if < x < Ae c ^- T ^ 



(4.1) 



'Ul(t,x) 
U°(t,x) 



if x > 
if x < 



F 2 (t,Ae c ^) 



a 2 [l-e^-T)} a 2 [1 - e 2c (*- T )] 



c (A-Ae 2 ^) 2 _c_ 2 

/r-2 L 



G 



Fi(^le c(i ' T) ), 



and therefore U°(t,x) is continuous for all (t,x) E [0,T) x [—A, A]. In fact more is true, and 
one can check that 

DF 2 (t,Ae c(t - T) ) = DF^Ae^"^), 

and thus U°(t,x) has a continuous partial derivative in x for x S (0,A). The mapping x — > 
F 2 (t,x) is convex, and the graph of this mapping lies above that of Fi(x), which is concave. 
Thus the two graphs intersect only at the point (Ae c ^~ T \ cA 2 [1 — e 2c ^~ T ^]/a 2 ), where both 
functions and their first derivatives in x agree. See Figure [6l Note also that U°(t,x) = U^_(t,x) 
for x £ (— oo, A] is the solution to the e = problem with escape from (— oo, A] (a "one-sided" 
version of the problem of escape from [—A, A]). 
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Simulation data for the schemes based on these two value functions are presented below. 
The approximated values are omitted since they are qualitatively similar to those in Tables Q] 
and El and only relative errors are presented. Table [5] gives data based on U° for the problem 
of two-sided exit, and should be compared to Table [2j The use of the solution to HJB equation 
for e = drastically improves the performance for small T, which is due to the fact that the 
subsolution based on the quasipotential is a very poor approximation to the solution for e > 
for such T. However, for large T the two schemes are comparably bad. 



e\T 


0.25 


0.5 
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1.5 


2.5 


10 


14 


18 


0.20 


2 


2 


1 


1 


1 


10 


57 


266 


0.16 


3 


3 


1 


1 


1 


10 


55 


265 


0.13 


4 


2 


1 


1 


1 


9 


51 


394 


0.11 


3 


2 


2 


1 


1 


9 


44 


177 


0.09 


3 


3 


2 


1 


1 


9 


43 


314 


0.07 


3 


3 


2 


2 


1 


9 


67 


520 


0.05 


4 


2 


2 


2 


1 


8 


50 


278 



Table 5: Using the subsolution based on the explicit solution to the e = HJB equation. Relative 
errors per sample for different pairs (s,T), two sided problem. 

The data for exit from one side are given in Table EJ In contrast with the two-sided problem, 
the performance does not degrade so quickly with large T. This suggests that the decline in 
performance is not so much due to using the approximating e = PDE, but rather the possible 
lack of regularity of the solution to this equation. Recall that the difficulty with the use of the 
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10 
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1 
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3 
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1 
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3 


0.11 


1 


1 


1 


1 


1 


2 


3 


0.09 


1 


1 


1 


1 


1 


2 


3 


0.07 


1 


1 


1 


1 


1 


2 


3 


0.05 


1 


1 


1 


1 


1 


2 


3 



Table 6: Using the subsolution based on the explicit solution to the e = HJB equation. Relative 
errors per sample for different pairs (e,T), one sided problem. 

subsolution based on the quasipotential alone (here Fi(x)) was that in a neighborhood of x = 
the second derivative was negative, and when T is large this leads to poor control of the variance 
of the associated scheme for both the one-sided and two-sided problems. 

When using the time-dependent e = PDE as the basis for a scheme for the one-sided 
problem, the introduction of i*2(i, x) appears to have largely mitigated the problem due to the 
second derivative term. Note that this function determines the value of U°(t, x) when x = and 
is convex rather than concave in i. In contrast, for the two-sided problem there is a concave 
singularity at the origin. There the second derivative in x is — oo, and the subsolution property 
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for the e > problem again fails. What is needed is a subsolution that works across the point 
x = for the e > problem. Such a subsolution can be constructed using the mollification 
introduced in the next subsection. Simulations based on such a mollified version of the e = 
subsolution (and with mollification parameter 5 = 2e) are presented in Table and support the 
claim just made. 
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4 


58 
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2 
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Table 7: Using the subsolution based on the mollification of U+(t,x) A U^(t,x). Relative errors 
per sample with x = A = 1, two sided problem. 

In spite of the shortcomings we have described in this section, the solution to the e = 
problem serves as the starting point for a construction that can be shown to perform well both 
in theory and in practice. There are three important modifications that are needed. 

• The first is that, in order to effectively deal with the e > dynamics and in particular 
to avoid the degradation still present in Table [6] when T is large, the region where the F2 
subsolution determines the dynamics must be enlarged. The solution to the e = problem 
constructed above leads to a region that vanishes exponentially in t — T [see (|4.ip ]. 

• The second modification will be the use a mollification to eliminate singularities such as 
the one at x = and help guarantee a global subsolution property for the e > PDE. 
The particular mollification we use is very convenient, and was first used for importance 
sampling in [7J, though with a somewhat different intended use. 

• The final modification was also alluded to previously, which is to revert to the quasipo- 
tential based control in an interval of the form [T — t* ,T]. All three modifications will be 
introduced in the next subsection in the context of the Gauss-Markov process. 

4.2 Simulation scheme for the Gauss-Markov model 

In this subsection we generalize the construction of the last subsection. As discussed there, the 
generalizations are needed to address issues that play a role in both the theoretical analysis 
of the scheme and its practical performance. After introducing these generalizations, we will 
demonstrate (numerically and theoretically) that the suggested change of measure does not 
degrade in performance as T gets larger and is close to optimality not only as e J. 0, but for 
fixed e > as well. 

We begin by introducing the first generalization, which is to replace the terminal condition 
00 • [x — x) 2 + Fi(x), which was used to define F2(t,x), by a terminal condition of the form 
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M(x— x) 2 /2+Fi(x). The parameter x replaces A and is a nominal value introduced to disconnect 
the localization from the boundary. The solution to the LQR that corresponds to M(x — x) 2 /2 + 
F\(x), which will be denoted F^it, x), is automatically smaller than F2(t,x) = F£°(t,x). The 
motivation for replacing oo by M/2 is because we want the solution to the LQR problem [i.e., 
F^it, x)\ to determine the control near x = for a set whose width is uniformly (in t) bounded 
below away from zero. As discussed in the last section, the second derivative term associated with 
F\ is of the wrong sign and degrades performance. The neighborhood where F^^t^x) < F\(x) 
does not degenerate as T — t — > oo, and its size is decreasing in M. The introduction of M 
complicates the construction by also requiring mollification (in addition to the one that will be 
needed at x = 0), since F^ can no longer be smoothly merged with F\. 
For M E (0, oo) the solution to this LQR takes the form 



Ff( t , x) = a- it) (, - xe"*-)) 2 + «<*), - (2e/M ;^ 2e2e| ,- T , > °. 



Recall that F x (x) = c[A 2 - x 2 ]/a 2 so that Fi(A) = 0. Define F$f + (t,x) = F^ 1 '(t,x),F$f_(t,x) = 
F^(t, —x). It will be important to know which of F^, F^_, and F\ is smallest, and we note 
here several properties. Let K = 2c/ M + a 2 . The first is that there are two real solutions to 
F2 + (t,x) = F\(x), and these take the form 



— [ ± .KTAe 2 ^ (4 2) 

K \ ± VMa 4 Ma 2 6 )' 1 j 

Between these roots F^At, x) < F\(x), and on the complement of the interval the reverse 
inequality holds. The limit t — T — > — oo gives the asymptotic endpoints of the interval where 
F$ 4 + (t,x) < Fi(x), which arc 

+ " / 2c 
V2c + Ma 2 ' 

We point out here that a natural scaling for this problem, given that the size of the neigh- 
borhood of zero where the quasipotential based subsolution fails for the e > system scales like 
y/e, is to ask that this width scale as 2e K , n € [0, 1/2]. If for example the desired width is 2e 1//4 , 
then when e > is small then we should take M ~ 2x 2 c/a 2 e 1 ^ 2 . 

The next adaptation is required so that singularities in the control associated with F^±(t, x) 
as t f T do not cause a problem, as well as for purposes of localization. For a parameter t* > 
we will want U°(T - t*,x) < Fi(x) for all x £ [—^4,^4], where (with an abuse of notation) 
U°{t,x) = F^ + (t,x) A U%(t,x) A F$t(t,x) A U°{t,x). This is true if and only if the smaller 
solution to FJjf,(T — t*,x) = Fi(x) is less than or equal to zero. This root was found to be 




2cK 2c 



Ma A Ma 2 



e -2ct* 



and the restriction that this be non positive can be simplified to 

1 , 2c 

t > -7-log- 



2c 6 Ma 2 ' 
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The inequality U°(T -t*,x)< Fi(x) will ensure that the subsolution property is preserved 
if we switch from using U° for t < T — t* to using F\ for t > T — t*. If T < t* this means that 
we always use F\, but our interest here is in large T. 

We assume that M > Ac/a 2 so that t* > 0. To guarantee that the smaller root is strictly 
negative and to conveniently satisfy a bound used later on we assume 

Besides enforcing the subsolution property across the handoff at time T — t*, the selection of 
t* plays a key role in determining the region used for the localization for the general nonlinear 
problem. Owing to the exponential decay, one can confine the localization to a small region with a 
modest value of t* . Suppose we consider confining to a region that scales as 2s 1 / 4 for all t £ [0, T— 
t*} and arbitrarily large T. As discussed previously, for small t and large T — t* this suggests that 
M be approximately 2x 2 c/a 2 e 1 ^ 2 , which means that t* > — ~ logfe 1 / 2 /^ 2 ]. Recalling that the loss 
in performance of the importance sampling scheme when the quasipotential based subsolution 
is used scales like ec, this gives a loss over such an interval of the form [T — t*, T] as scaling like 
—elog[e/x 2 ]. We will return to such considerations in the next section. 

Finally there is the issue of mollification. Owing to the replacement of F%± (t, x) by F£± (t, x), 
there are now several sources of discontinuity in the gradient. The subsolution prior to mollifi- 
cation would in general take the form F^ (t , x) A (i, x) A F^_ (i, x) A (t,x). However, since 
we know that F^ + (t,x) A F^it^x) is dominated by F±(x) near x = 0, we can also use 

F 2 M + (t,x)AFi^(t,x)AF l (x). 

See Figure [71 

We next state a result that will be used to derive performance bounds for schemes based 
on the mollification. The proof is deferred to the Appendix. We consider the general one 
dimensional process model 

dX e = b{X e )dt + ^fecr{X e )dB, 
where b and a are Lipschitz continuous. Letting a(x) = cr(x) 2 , the relevant e-dependent PDE is 

Q £ [U] (t, x) = U t (t, x) + DU(t, x)b(x) - - \a{x)DU{t, x)\ 2 + ^a{x)D 2 U(t, x) = 0. 
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Lemma 4.1 Suppose that the functions Ui : [0,T] x R, i = 1, ...,n are £u>ice continuously 
differentiable in x, once continuously differentiable in t, and satisfy 



For 5 > define 

and define the weights 

Then 

min $Ui(t,x), 
and for < e < 5 



G e [Ui}(t,x)> 7l ( Xl t,e). 



U s {t,x) = -5 log K] e -|^(^) 

\i=l / 



i = 1, . . . , n 



e [tf*](t,s) > £(1-^ 



i=l 



| > U s (t,x) > miniUi(t,x),i = 1, . . . ,n| — Jlogn, 

n 

X! Pi (*> x > 5 ) o-(x)DUi(t, : 
i=l 



^ ft, x; 6) a(x)DUi(t,x) 



+ ^2 Pi {t,x;5)ji(x,t,e) 
i=i 

> J^Pi (i,x;(5)7i(x,t,e). 



i=l 



Based on this result we consider the mollification of U° (t,x) = (t , x) A F 2 A 1 (i, x) A F\ (x) 
which is 

U 5 (t,x) = -Slog (e-fo"^ +e-^ M -^ + e -**(*>) . 



For reasons that will be made clear in the analysis (see Lemma |4.5|) . we generally use 5 = 2e. 

As discussed previously, we return to the control based on the quasipotential for the last t* 
units of time, and so the subsolution takes the form 



U\t,x) 



Fi(x), t>T-t* 
U 5 (t,x), t<T-t* " 



(4.4) 



Note that the mollification reduces values, in that U s (t,x) < U°(t,x), and so the requirement 
II s (T - t*,x) < F x {x) for x G [—A, A] holds, since U°(T - t*,x) < Fi(x). 
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4.3 Analysis of the scheme 

We next present a rigorous and nonasymptotic bound for the second moment of the importance 
sampling scheme constructed in the last subsection. To derive a bound that is valid for e > 
and uniform in T, we use the same representation as in (|3.2p . In particular, we choose U(t, x) = 
U s (t,x) defined via (|4.4p for the design and W(t,x) = U s,r, (t, x) for the analysis, where 

1 ' ' ~ \ U s "(t,x), t<T-t* ' 

with 

U s *(t,x) = (1 - i])U s (t,x). 

As discussed in Section 3 this choice is driven by the need for a subsolution for the s > dynamics 
with an explicit form, and this limitation of the technique leads to conservative bounds on the 
true performance. To simplify notation, for smooth functions W, U we define 

G e [W, U] (t, x) = G £ [W] (t, x) - i \a (DW(t, x) - DU(t, x))\ 2 , (4.5) 

where Q e [W] is defined in flO). Using that £ [Fi] = £°[F 2 A/ ] = 0, 

e [Fi](:c) = e -o 2 D 2 F x (x) = -ec and g E [Ff](t,x) = ^a 2 D 2 F% 1 \t,x) = ea 2 a M (t). 

Since the problem is symmetric, it suffices to consider only x G [0, A). A key role is played by the 
weights associated with the exponential mollification of the subsolution, which take the forms 



p 2 ' (t,x;S) 



and 



pi(t,x;S) 



g 5 1 _|_ g (5- 1 2,-VW _)_ g~7 J 

To determine which of these dominate at any (t, x), the relative sizes of the functions F^it, x) 
and F\{x) are required, with smaller functions corresponding to more dominant weights. For 
this reason the solutions to F2±(t,x) = F\(x) play an important role, and especially the larger 
one identified in (|4.2p . See Figure [71 The following bounds on this root will be used to partition 
the domain in the analysis. Let z = x(c/Ma 2 ) 1 / 2 /2 and let R(t) denote the larger root in (|4.2p . 
Then we claim under (14.31) that 



2z < R(t) < 8z for all t G [0, T - £*] 

and uniformly in T. We recall the definition K = (2c/ M) + a 2 G (<r 2 ,oo) and that M > Ac/a 2 
so that if G (a 2 , 2a 2 ). Then the smallest possible value of R(t) satisfies 



a 2 x \2cK a 2 x I 2c 
TW > T^VMo 1 "^VM^ 
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while the largest satisfies 



a 2 x 



e~ ct * + 



2cK 



2c 



-2ct* 



Ma A Ma 2 



< 



a 2 x 



< 



2c 

~K \ Ma 2 

2c 



a 2 x 



(7- 



< x 4 



Ma 2 



Ma 2 



+ 



+ 



2cK \ 
Ma 4 J 

4c \ 



Ma 2 



8z, 



where the first inequality uses t* > -\ log j^p. We set H = lOz > 8z. 

In order to obtain bounds on the performance under the corresponding scheme, we need to 
bound Q e [U S ' r, ,U s ](t,x) from below in various regions. By (|4.5p 

g £ \U s >\U s ]{t,x) = g £ [U s "](t,x) (DU 5 *(t,x)-DU s (t,x)^\ 2 



gz[U s «](t,x)--r, 2 



aDU s (t,x) 



(4.6) 



We will use the notation 
and 



71 = G £ [F 1 ](x) = -ec 



7^(t) = G £ [F 2 M ](t,x)=ea 2 a M (t). 
Straightforward calculations and some algebra give 



G £ [U^](t,x) > (1 - r,)g s [U d ](t,x) + - (r, - r, 2 ) aDU d (t,x) 
For notational convenience, define 



Po(t,x) = a 2 



M,- 

P2 



\DF^ + \ 2 + p^~\DF^_\ 2 + pil^Fil 2 



p M' + DF 2 M + + P ^-DF 2 M _+p 1 DF 1 



(t,x) 



Note that by Jensen's inequality /3o(t,x) > 0. We next apply Lemma 14. II to g e [U s,r] ](t,x) (while 
suppressing the dependence on 5 in the notation for the p's), and use (|4.6p to get 



gt[U s «,u s ](t,x)>(l- v )l(l-- 6 



A)(t, x) + (1 - r,) p^ + (t, x) + pf "(t, x)J 7 f (*) 

2 



+ (1 - 77) Pl (i, x) 7 i + - (t? - 2r, 2 ) \aDU 5 (t, x 



(4.7) 



for all x G [-A, A] and t G [0, T - t*]. 

We will partition the domain according z = i(c/Ma 2 ) 1 / 2 /2 and iif = lOz. We consider three 
cases depending on whether x G [0, z], x G [2, -ff] or x G [^-4] if a; > 0. The case x < is 
symmetric. Before proceeding with the analysis for each of the cases, we give the definition of 
exponential negligibility, a concept used frequently in the rest of the paper. 
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Definition 4.2 A term is called exponential negligible if it is bounded above in absolute value 

_d 

by a quantity of the form ece e , where c < oo and d > 0. 

Lemma 4.3 Assume that (t,x) G [0, T — t*] x [0,2;], 5 > e and n < 1/2. Then, up to an 
exponentially negligible term 

g £ [u s ' r ',u s }(t,x)>o. 

Proof. In this region F\(x) > F^ + (t,x), and we claim that the inequality is in fact strict. We 
have 

F l{ x) - Fjf + (t,x) = ± [x 2 - x 2 ] - R _.^ t _ T) [x^ - x ]\ (4.8) 

For each fixed t this defines a concave function of x. At x = the value is minimized when 
t = T-t*. Using e" 2c '* < [2c/Mct 2 ] 4 < c/Mct 2 (since M > Ac/ a 2 ) and K = a 2 +2c/M, we obtain 
the strictly positive lower bound x 2 c [l/a 2 - I /{a 2 + c/M]] . Since F ± (2z) - F$f + (t, 2z) > 0, by 
concavity there is c\ > such that (|4.8p is bounded below by c\ for all (t, x) £ [0, T — t*] x [0, z\. 
Thus the term in (I4.7P involving the weight /?i is exponentially negligible. Since /3o(t,x) > 0, 
72^(t) > and 77 < 1/2, all other terms are non-negative, and the result follows. ■ 

Lemma 4.4 Assume that (t,x) £ [0,T — i*] x [5,-4], 5 > e and n < 1/4. Then letting e = 
cH 2 /3a 2 , we have that for all e G (0,£o) an( ^ an 2/ 6 [ £ /( £ + cH 2 /^ 2 ), 1/4], up to an 
exponentially negligible term 

g £ [U S ' T >,U s ](t,x)>0. (4.9) 

Proof. In this region F 2 A ^_(t,x) > -Fi(x), and it is straightforward that the terms associated 
with F£*_{t, x) are exponentially negligible. Note that x — > FJr+(t,x) — F\(x) is convex, recall 
that for each t G [0, T — t*] the largest value where the two functions agree is smaller than 
8z < H, and that 

DF&{t,x) - DF l{ x) = I - _ _^ T) (x - . (4.10) 

Inserting the largest root for the given t gives the value 



2cx / 2cK 2c 

e -2ct* 



K - a 2 e 2c ^ T ) V Ma A Ma 2 

A lower bound on the first term is 2cx/K. Using e~ ct < [2c/Ma 2 ] 2 , the definition of K, and 
Ac/Ma 2 < 1 to bound the second term from below produces the strictly positive lower bound 



for all t G [0, T — t*] and x > 82. Since H = IO2 > 82, this shows that there is C2 > such 
that F£f + (t,x) - F\(x) > c 2 for all (t,x) G [0, T — t*\ x [H, A] . It follows that terms involving 



r 2 
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p^'^it, x) are exponentially negligible. Since f3o(t, x) > and pi(t, x) = 1 up to an exponentially 
negligible term, 

g £ [U S '^ U 8 ](t, x) > (1 - v)Pi(t,xh° + Uv- 2V 2 ) ° 2 \pi(t,x)DF 1 (x)\ 2 



x z 



>-{l-i 1 )ec + 2(r,-2r l 2 )^ 
up to an exponentially negligible term. Choosing rj < 1/4 gives 

g £ [U s ",U s ](t,x) > -(i-^ec + ^x 2 



and for e small enough such that rj € [e/(e + cH 2 /a 2 ), 1/4], the last display is non-negative. For 
this interval to be nonempty imposes the constraint e < Eq = cH 2 /3a 2 . Hence in this region 
and up to an exponentially negligible term, 

g £ [u 5 >\u 5 ](t,x) > o. 



The final region is the most difficult, since F\(x)—F2 + {t, x) can be either positive or negative. 

Lemma 4.5 Assume that (t,x) 6 [0,T — t*] x [z, H\, rj < 1/4 and set 5 = 2e. TTien up to an 
exponentially negligible term 



g £ [u 5 *,u 5 ]{t,x) > I 



2ec 



AO. 



Proof. While terms corresponding to F^-it, x) are exponentially negligible in this region, since 

F 1 {x)-F^x) 



changes sign both p^'" 1 " an d Pi may be important. Since they are negligible we omit terms 
corresponding to F 2 A ^(t, x). 

By (|4.7p we have up to an exponentially negligible term 



g £ [u s *,u 5 ](t,x) >(i - r,)±Po(t,x) + (i - ri)fg> + (i,x) 72 M (i) + (i - v)piM-n 



i 



+ J (77- 2r? 2 ) a 2 /0 f' + (t l x)D J F 2 M (< l x) +pi(t >a; )£>F 1 (x 



(4.11) 



As noted previously (3o(t,x) > for all However, we will exploit the fact that 0o(t,x) = 

only for points (t,x) such that DF\{x) = DF^ 4 (t,x). We distinguish two cases depending on 
whether pi(t,x) > 1/2 or pi(t, x) < 1/2. 

Case I: pi(t,x) > 1/2. We know that Po(t,x) > and 7 2 Vf (i) > and can ignore those 
terms. Using + p\ = 1, the terms that remain are 

(1 - r/)pi(t,x) 7 i + ^ (77- 2t7 2 ) a 2 [ Pl (t, xf\DF x {x) - DFif + (t,x)\ 2 



+ 2 Pl (i,x) J DF 2 M + (t, a ;)(Z)F 1 (x) - DF™ + (t,x)) + \DF%.(t,x)\ 



M 



?M 



26 



March 5, 2013 



We claim that for (i, x) G [0, T - t*} x [z, H] 

DF$f + (t,x) (DFi (x) - DF 2 M + (t,x)) > 0. 

First, we note that e _rf * < (2c/Ma 2 ) 1/2 = (Ac/Ma 2 ) 1/2 /4 < 1/4, and thus xe ct * > Ax > H. 
Therefore 



DF 2 M + (t,x) 



< 



2ce 2c ^ 
K - a 2 e 2 < l - T ) 

2ce 2c(t-T) 



x — xe 



<t-T) 



H — xe 



ct* 



K - a 2 e 2 ^- T ) 
< 0. 

Second, by (p~TU|) . the definition of z and e~ ct * < (2c/Ma 2 ) 2 , we also have 
DFi{x) -DFif + (t,x) 



< 



< 



2c 




K 




a 2 


K - 


<T 2 e 2c (*- 


-T) 


2c 




K 




a 2 


K - 


a 2 e 2c (*" 


-T) 


2c 




K 




a 2 


K - 


a 2 e 2c (*" 


-T) 



( — xe 



-ct* 



X 



( 2c \ 2 X / C \V2 



< 0, 



where the last inequality uses 4c/Mct 2 < 1. We conclude that DF 2 M (t, x)(DF 1 (x)-DF 2 AI (t, x)) > 
0. Since pi(t,x) € (1/2, 1) and 77 < 1/4, we obtain the bound 



g £ [U s «,U 5 ](t,x) > -{l-^ec + ^a^DF^x) - DF 2 M + (t,x)\ 2 . 

lb ' 



(4.12) 



This gives a bound for Case I. 

Case II: p±(t,x) < 1/2. Here we will have to use /3o(t,x). Dropping other terms on the 
right that are not possibly negative, we obtain from (|4.1ip that 



g £ [u s «,u s ](t,x) > (1 - rj)-Mt,x) + (1 - v)pi(t,x)-n- 



1 



Omitting exponentially negligible terms, we note that 

?M i 2 



Po{t,x) =0* (1 - pi) \DF% + \" + pi \DFtf - \DF 2 M + + p 1 (DF 1 - DF^)\" (t,x 



jM \|2" 



<T 2 P1 



\DF, 



\DR 



M |2 



2DF 2 M + (DF 1 - DFi%) - p 1 (DF l - DF 2 A '^) 2 \ (i,x) 

= a 2 Pl (DF 1 - DF 2 M + ) [DF 1 + DF 2 M + - DF 2 M + - p l {DF l - DF%+)] (t,x) 
= a 2 Pl (l - Pl )(DF 1 - DF 2 A ^) 2 (t,x) 

> y 2 p 1 (DF 1 -DF 2 M + ) 2 (t,x), 
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where the last inequality uses pi(t, x) < 1/2, and so obtain 



gc[U^,U s ](t,x)>(l-r,) Pl 



a 2 \DF x (x) - DFif + (t,x)\ 2 - ec 



This gives a bound for Case II. 

Straightforward estimation gives, for all (t,x) S [0, T — t*] x [z, H], the lower bound 



DFl I + (t,x)-DF 1 (x) 



2c 



A" 



a 2 K - CT 2 e 2c (*- T ) 



r 2 

K' 



x _ ^- £e c(t-T) 



Using this bound and n < 1/4, we get a lower bound from (|4,12p in the form 

1 



g £ [u 5 ^,u 5 ](t,x) > 



^pa^DF^x) - DF^ + {t,x)\ 2 - ec 



> 



^L(z-xe c( - 
Aa 2 



ec 



Since this is less than the bound for Case II when both terms are negative, the conclusion of 
the lemma follows. ■ 

Theorem 4.6 Assume 5 = 2e,i] G {e/(e + cH 2 /a 2 ), 1/4), and that z 2 cr\ > Sea 2 . Let u be the 
control based on the function U s defined in J^.^p , i.e., u(t,x) = —aDU s (t,x). Then up to an 
exponentially negligible term, we have 



where 



-elog Q £ (0, 0; u) > 2Ii(e, n, T, x, M)l {T > t * } + 2J 2 (e, T)l {T<t * } 

1 



h(e,ri,T,x,M) = (1 - V )U%0, 0) + log 



I 2 (e,T) = 2L-cTe 



y/4ea 2 /cr] 



AO e 



and L = l^A 2 . 

Remark 4.7 Although the bound provided by Theorem takes a complicated form, it is im- 
portant to note that it does not degrade as T — >■ oo, and this is also reflected in the simulation 
data. Also, as noted previously there are natural scalings under which n — > and M — > oo as 
e — > 0. Using the bound from below given in Lemma \4-l\ and the explicit form of F^ ±(0,0), we 
obtain 



U s (0,0) > 



2c_ 

M 



+ a A — a z e 



2 e -2cT 



x 2 + (2L 



c X.2 



S log 3 



Hence, we obtain the rate of decay 



cx 

2L + ^ 
o~ 



-2cT 



1 - e 



-2cT 



uniformly inT as e — > 0. 



28 



March 5, 2013 



Proof of Theorem 14.61 The starting point is the representation (|3.ip [but rewritten for the 
more general process model and with time dependent u], which is valid for every e > 0. We can 
restrict to v such that t £ <T w.p.l, obtaining 



elogQ £ (0,0;u) 



inf 

v£A:t £ <T w.p.l 



E. 



0,0 



v(s) 2 ds 



u(s,X £ (s)) 2 ds 



(4.13) 



We can also assume T > t* since the bound is straightforward otherwise. We recall that under 
(|4.3p the subsolution property is preserved for U s (t,x) at T — t*, i.e., that 

U s (T-t*,x) < Ft(x) for all x £ [—A, A] . 

Next consider any control in the representation such that f £ < T w.p.l. We will apply Ito's 
formula separately over the intervals [0, (T — t*) A f £ ) and [(T — t*) A t £ ,t £ ) and also use the 
boundary condition U s ^(t,±A) < Fx(±A) < for t £ [0,T]. Since U S,V (T - t*,x) < F^x) we 
obtain 



-U s '^(0, 0) > E ,o \u S ' v (f e , X £ {f £ )) - U S ^{{T - t*) A f £ -,X £ ((T - t*) A f £ -)) 



+ E 



o.o 



U 5,r, ([T - t* ) A f £ , X £ ((T - t*) A f £ )) - C7 5 '"(0, 0) 



{r E >T-t*} 

(4.14) 



Using Lemma 16.11 and recalling the definition of Q £ [W, U] in (|4.5p , the contribution from 
[0, (T-t*) Af £ ) gives 



E ,o 
= E n 



U S ' V ((T - t*) A f £ ,X £ ((T - t*) A f £ )) - U^iO, 0) 



(T-t*)Af e 



g £ [U s ^,U%,X £ (s))ds-^v(s) 2 ds + ^u(s,X £ (s)) 2 



ds. 



An analogous formula holds for [(T — t*) A f £ ,f £ ), save that since 

jj8, v = jj8 = the term 

Q £ \U 5,ri , U 5 } simplifies to t/ e [Fi]. Rearranging and using (|4.14p . 



E 



o.o 



v(s) 2 ds — 

(T-f)Ar E 



u{s,X £ (s)) 2 ds 



> 2U s > r >(0,0) 



+ E n 



2g £ [U s ^,U d ](s,X £ (s))ds + E 



0,0 



(T-t*)Af e 



We now replace each term by a lower bound, using Lemmas H31 IH and S3] for 2Q £ [U 6 ^, U 5 } 
Since the bounds are independent of the control process v, the representation (|4.13p implies 



-elogQ e (0,0;n) > 2(1 - 7])U S (0, 0) + 



2 

C T] 

2^2 



z — xe 



c(s-T) 



2ec 



ds — t*2ec, 



where J are the times in [0, T — t*] where the integrand is negative. 

We next use the constraint z 2 crj > 8ea 2 , which guarantees that for T — s sufficiently large 
the integrand is in fact positive. Let 



2 

c rj 
2^2 



z — xe 



-cb 



2ec or b 



1 



log 
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Then since the integrand is only negative for s > T — b, we obtain the lower bound 
— [(b — t*) V 0] 2ec for the integral. Adding the remaining —t*2ec then gives the result as stated. 
■ 

4.4 Simulation results for the linear problem 

In this subsection we present simulation data for the linear problem and make several comments 
on the application of the algorithm. For comparison purposes, we consider the same two sided 
problem corresponding to the data from Tables Q3 [2 [5] and [7J Thus we consider the small noise 
diffusion process with drift b(x) = —V'(x), where V(x) = \x 2 , diffusion coefficient y/e, and 
starting from the stable equilibrium point x = 0. The goal is to estimate the probability of 
exiting the set (—1,1) by a given time T. 

As discussed in Subsection 14.21 the change of measure for the importance sampling scheme 
is based on the subsolution (j4.4|) . In order to apply it to a given pair (e, T), one needs to choose 
the parameters (x,M,t*,8). Before presenting simulation data, we comment on these choices. 

The analysis in Subsection 14.31 assumes t* > — flog 17% and 5 = 2s, and we will take 
t* = — - log -tjS^ . As noted before Lemmas I4.3H4.51 it is natural to allow quantities such as z and 
H, which characterize the region where the solution to the LQR replaces the subsolution based 
on the quasipotential, to depend on e. One would like the width of this region to scale like e K , 
with k € (0, 1/2], which in turn suggests that M scale like 2x 2 c/a 2 e 2K . However, the exponential 
negligibility of certain terms that holds when parameters such as z, H and M are independent 
of e need not hold when they depend on e. For example, the exponential negligibility of the 
term (1 — rj)px(t, x)^ appearing in Lemma 14.31 should be examined. 

Recall that the exponential rate of decay of terms like (1 — r/)px (tjX^i is bounded by the 
smallest value of F±(x) — F^At,x). A lower bound of the form x 2 c [l/<7 2 — l/[<7 2 + c/M]] was 
obtained in the proof of Lemma 14.31 Inserting the given scaling and approximating for small e 
gives ce /2a 2 , and upon dividing by 5 = 2e gives the exponent cs 2k 1 /Aa 2 . Hence exponential 
negligibility requires k G (0, 1/2), with smaller values of k giving a faster rate of decay. Note 
however that the analysis assumes M > Ac/ a 2 and z 2 cq > Sea 2 . With regard to the condition 
M > Ac/a 2 , inserting the given scaling for M we get the constraint x 2 /2e 2k > 1. This is clearly 
satisfied for small e > if x is of order 1. One may also take here x to be of order e x and the 
constraint will be satisfied for small e if A < k. We also remark here that for the nonlinear 
problem, the condition M > Ac/ a 2 needs to be strengthened to M > Ac/ a 2 . With regard 
to the condition z 2 cq > 8ea 2 , inserting the given scaling for M and recalling the definition 
z = x{c/Ma 2 ) 1 / 2 /2, we obtain the constraint e 2/t_1 > 6Aa 2 /rjc. This constraint is satisfied for 
small enough e when k € (0, 1/2), and moreover one can allow 7/ — > as e — > 0. 

Below we present simulation data for various choices of the parameters as indicated in the 
corresponding tables. In Table EJ estimated probabilities are reported when M = A and x = 1, 
whereas the related relative error per sample estimates are reported in Table [9] (since the relative 
errors are consistently smaller we now round to the nearest 1/10). In Tables PTOl and PTT1 relative 
errors per sample estimates are reported for combinations of (M,x) that depend on e. The 
related probability estimates are almost identical to those in Table Note that the degradation 
in performance as T is gets larger observed previously, is no longer present. This agrees with 
the theoretical performance bound appearing in Theorem 14.61 
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e\T 


1.5 


2.5 


5 


7 


10 


14 


18 


23 


0.20 


9.1e- 03 


2.3e - 02 


5.7e- 02 


8.3e- 02 


1.2e - 01 


1.7e-01 


2.1e-01 


2.7e-01 


0.16 


2.2e- 03 


6.6e - 03 


1.8e- 02 


2.7e- 02 


4.0e-02 


5.7e- 02 


7.4e - 02 


9.5e-02 


0.13 


5.1e-04 


1.6e - 03 


4.6e - 03 


6.9e-03 


Lie -02 


1.5e- 02 


2.0e - 02 


2.6e- 02 


0.11 


Lie -04 


3.9e - 04 


1.2e-03 


1.8e-03 


2.8e - 03 


4.1e- 03 


5.4e - 03 


7.0e-03 


0.09 


1.3e- 05 


5.2e - 05 


1.7e-04 


2.6e- 04 


4.1e-04 


5.9e- 04 


7.8e - 04 


1.0e-03 


0.07 


4.3e- 07 


2.2e-06 


7.6e - 06 


1.2e-05 


1.9e-05 


2.8e- 05 


3.7e - 05 


4.8e- 05 


0.05 


9.7e-10 


6.9e - 09 


2.8e- 08 


4.4e - 08 


7.0e - 08 


Lie -07 


1.4e-07 


1.8e- 07 



Table 8: Estimated values for different pairs (e,T). M = 4, x = 1. 



e\T 


1.5 


2.5 


5 


7 


10 


14 


18 


23 


0.20 


1.7 


1.4 


1.0 


0.9 


0.6 


0.6 


0.7 


0.9 


0.16 


2.1 


1.8 


1.2 


1.0 


0.8 


0.7 


0.7 


0.8 


0.13 


2.4 


2.2 


1.6 


1.4 


1.1 


0.8 


0.8 


0.8 


0.11 


2.9 


2.7 


2.0 


1.6 


1.3 


1.1 


1.0 


0.9 


0.09 


3.6 


3.6 


2.7 


2.3 


1.8 


1.5 


1.3 


1.2 


0.07 


4.9 


5.7 


4.2 


3.4 


2.9 


2.4 


2.1 


1.9 


0.05 


8.9 


13.0 


9.9 


8.3 


6.8 


5.7 


5.0 


4.4 



Table 9: Relative errors per sample for different pairs (s,T). M = 4 and x = 1. 

5 The non-linear one dimensional problem 

In this section, we extend the construction of Section|4]to the general non-linear one dimensional 
setting. We also generalize the notation and allow the stable equilibrium to be an arbitrary 
point xq. Consider the process model (jl.ip and assume that b,a £ C 1 (R) and that b(x ) = 0, 
b'(xo) < and a 2 (x) > a\ > for all x G M.. Thus we can write b(x) = —V'(x) with unique 
local minimum at x = xo and V (xq) > 0. It is easy to see that the quasipotential with respect 
to the equilibrium point xq takes the form 

S(x ,x)= I -2 10 Z dz. 

Jxo ^ ( Z ) 

The problem of interest is to estimate the exit probability 

6 £ = F X0 {X s hits Ai or A 2 before time T} , 

where xq is the initial (and rest) point such that xo 6 (AijAz). Furthermore, we assume that 
b(x) < for all x £ {x ,A 2 ] and b(x) > for all x G [Ai,x ). Set L = \ \S{xq,A\) V%,Ai)]. 

The approach to the nonlinear problem is to merge the linearized dynamics around the 
equilibrium point with the subsolution based on the quasipotential. This subsolution is 

F x {x) = 2L- S(x ,x). 

Observe that the second order approximation to this function around the equilibrium point xq 
is 

F 1 (x) = 2L-^(x-x ) 2 , 
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e\T 


2.5 


5 


7 


10 


14 


18 


23 


0.20 


1.3 


0.9 


0.7 


0.6 


0.6 


0.7 


1.0 


0.16 


1.5 


1.1 


0.8 


0.7 


0.7 


0.7 


0.9 


0.13 


1.7 


1.2 


1.0 


0.8 


0.7 


0.7 


0.8 


0.11 


1.8 


1.4 


1.2 


0.9 


0.8 


0.7 


0.8 


0.09 


2.0 


1.6 


1.3 


1.1 


0.9 


0.8 


0.8 


0.07 


2.2 


1.9 


1.6 


1.3 


1.1 


1.0 


0.9 


0.05 


2.4 


2.5 


2.1 


1.7 


1.5 


1.3 


1.1 



Table 10: Relative errors per sample for different pairs (e,T). M = -j= and x = 1. 



e\T 


2.5 


5 


7 


10 


14 


18 


23 


0.20 


1.5 


0.9 


0.7 


0.7 


0.9 


1.1 


1.3 


0.16 


1.7 


1.0 


0.8 


0.7 


0.8 


1.0 


1.2 


0.13 


1.8 


1.1 


0.8 


0.8 


0.8 


1.0 


1.0 


0.11 


1.9 


1.1 


0.9 


0.7 


0.9 


0.9 


1.1 


0.09 


2.2 


1.2 


0.9 


0.8 


0.8 


0.9 


1.1 


0.07 


2.4 


1.3 


1.0 


0.8 


0.8 


0.9 


1.1 


0.05 


2.9 


1.5 


1.1 


0.9 


0.8 


0.9 


1.0 



Table 11: Relative errors per sample for different pairs (e,T). M = and x = e 015 . 

where c = — b (xo) and a = ct(xq). Let x+, x_ be such that x + — xq = xq — x_ . The appropriate 
translated version of F^L. is 

F 2 ^(t,x) = a M (t) ((* - x ) - (£+ - x )e" c(i - T) ) 2 + F 1 {x + ), 

and F${_ {t, x - x) = F$f + (t, x + x). 

The subsolution for times less than T — t* will be the mollification of F^it, x) A (t, x) A 
Fi(x). Note that since Fi(x) agrees with F\{x) up to second order it is still the case that 
F^it^x) A F2^_(t,x) will be smallest near x = xq. Letting 

U s (t,x) = -Slog (e-W+fc*) +e-^2 M -(M) +e 4^W) (5.1) 

and 

U ^ x >-\ U s (t,x), t<T-t* ' {b - 2) 
the suggested importance sampling control that is used for the simulation is 

u(t,x) = -a(x)DU s (t,x). 

Notice that this construction reduces to the construction of the linear case if the potential is 
indeed quadratic, since then Fi(x) = F\(x). 
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In Subsection 15.11 we present simulation data for the nonlinear problem, demonstrating the 
effectiveness of the suggested change of measure. The analysis and the theoretical bound for 
the performance of this scheme are completely analogous to the linear problem, modulo the 
additional error coming from the linearization of the dynamics in the neighborhood of the stable 
equilibrium point. In Subsection 15.21 we rigorously analyze the performance of this algorithm. 

5.1 Simulation results for nonlinear problem 

In this subsection, we present simulation data for the nonlinear problem. We take the drift to 
be b(x) = —V'(x), where the potential function is V(x) = \{x 2 — l) 2 . This potential function 
has two stable points at —1 and at +1, and an unstable equilibrium at 0. We assume that the 
starting point is at the right equilibrium point xq = — 1 and the exit set is the level set of the 
potential function D = {x : V{x) < L}, with L = 0.45. Thus exit occurs from either of the 
points Ai = -1.40 or A 2 = -0.23. 

Notice that the local quadratic approximation around the equilibrium point is V q (x) = 
t}c(x + l) 2 with c = 4. Moreover, we have chosen, for simplicity, the diffusion coefficient to be 
constant <j(x) = 1. N = 10 7 independent trajectories were used for the simulations. 

We first investigate the performance of a change of measure based on the quasipotential 
subsolution. Thus we change the measure via the control u(x) = —DF\(x). Estimated values 
and the corresponding estimated relative errors per sample for several values of (e, T) are in 
Tables [T2] and [T3J respectively. 



e | T 


0.5 


1 


1.5 


2.5 


4 


5 


6 


8 


10 


0.14 


3.01e - 03 


8.21e - 03 


1.36e - 02 


2.48e - 02 


5.82e - 02 


6.22e - 02 


4. lie - 02 


4.29e - 02 


5.31e - 02 


0.12 


1.03e - 03 


2.91e - 03 


4.92e - 03 


8.95e - 03 


1.46e - 02 


1.73e - 02 


2.10c - 02 


l.Sle - 02 


1.72e - 02 


0.09 


S.27e - 05 


2.52e - 04 


4.35e - 04 


8.10e - 04 


1.33e - 03 


1.53e - 03 


1.65e - 03 


1.58e - 03 


1.57<? - 03 


0.07 


4.60e - 06 


1.49e - 05 


2.64e - 05 


4.97e - 05 


7.63e - 05 


l.lle - 04 


9.74e - 05 


1.21e - 04 


9.19e - 05 


0.05 


2.49e - 08 


8.91e - 08 


1.62e - 07 


3.15e - 07 


5.03e - 07 


6.19e - 07 


6.78e - 07 


5.85e - 07 


6.29e - 07 


0.08 


1.25e - 13 


5.40e - 13 


1.03e - 12 


2.10e - 12 


3.80e - 12 


6.12e - 12 


4.87e - 12 


1.04e - 11 


5.18e - 12 



Table 12: Using the subsolution based on quasipotential throughout. Estimated values for 
different pairs (s,T). 



e\T 


0.5 


1 


1.5 


2.5 


4 


5 


6 


8 


10 


0.14 


2 


2 


4 


21 


799 


953 


127 


169 


488 


0.12 


2 


2 


4 


18 


125 


315 


649 


368 


301 


0.09 


2 


2 


4 


20 


143 


155 


311 


173 


288 


0.07 


2 


2 


4 


17 


68 


433 


192 


540 


272 


0.05 


2 


2 


4 


16 


77 


296 


410 


148 


287 


0.03 


2 


2 


3 


14 


160 


638 


347 


1933 


317 



Table 13: Using subsolution based on quasipotential throughout. Relative errors per sample for 
different pairs (s,T). 

As we see from Table [131 even though the quasipotential subsolution performs relatively well 
for small values of T, there is a clear degradation of performance as T gets larger. It is also 
interesting to note that the degradation is uniform across all values of e for the same value of 
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T. This behavior parallels what was observed for the linear problem. As was mentioned there, 
the large per sample relative errors for T > 2.5 should not be taken as being accurate, but just 
indicative of poor performance. 

Next we investigate how the suggested change of measure performs. To apply the control, 
we choose values for the parameters (x, M, t* , 5) according to the discussion in Subsection 14.41 
However, for reasons that will become clearer in the proof of Lemma 15.61 we need to strengthen 
the condition M > 4c/ a 2 to M > Ac/ a 2 , say M > 5c/ a 2 . When we link the other parameters 
to e the z — > as e — > 0, and because of this we do not explicitly take into account the error 
from the approximation around the neighborhood of the rest point of the true dynamics by its 
linearization when selecting the parameters for the implementation of the scheme. 

Estimated values and corresponding estimated relative errors of the exit probabilities of 
interest for different pairs (e, T) and different combinations values for z are in Tables [T4l through 
[181 Estimated relative errors for M = -§§4r with k = 0.4 and x = 0.4 are reported in Table [HI 
whereas the related relative errors are reported in Table fl~5l In Tables [TBI and \T7\ we report only 
estimated relative errors for the same value of k but for x = 0.5 and x = 1 respectively. The 
related probability estimates are almost identical to those of Table [Til so they are not repeated. 

Note that for Table[JBJ t* > T when T = 0.5 and for e < 0.05. Similarly, for TableQH t* > T 
when T = 0.5 and for T = 1 when e < 0.09. For such values, the quasipotential subsolution is 
being used everywhere (see (|5.2p ) and the numerical results for these values agree with those 
from Table [U 

In order to illustrate the effect when the linear approximation is used over a relatively large 
region, the data in Table [18] are estimated relative errors when M is considerably smaller than 
before, and thus z is considerably larger. In particular, we have taken k = 0.25 and x = 1. 
Comparing Tables [T7] and [T8l we notice that if the error from the linearization is not confined 
to a small enough region then the algorithm degrades in e though it appears stable in T. This 
is consistent with the theoretical results, which imply a uniformity in T but only logarithmic 
optimality in e. That said, one would like to minimize errors associated with linearization as far 
as possible. As noted in Subsection 14. 4| one should choose the scaling parameter k E (0, 1/2). 
However, with the nonlinear problem minimizing the region over which the approximation is 
used calls for larger n, and so one want it close to but not equal to 1/2. For the problems 
considered here, k = .4 worked well. 



e | T 


0.5 


1 


1.5 


2.5 


4 


5 


6 


8 


10 


13 


0.14 


3.07e - 03 


8.36e - 03 


1.39e - 02 


2.49e - 02 


4. lie - 02 


5.12e - 02 


6.24e - 02 


8.33e - 02 


1.04e - 01 


1.34e - 01 


0.12 


1.05e - 03 


2.97e - 03 


4.99e - 03 


9.06e - 03 


1.52e - 02 


1.91e - 02 


2.32e - 02 


3.12e - 02 


3.92e - 02 


5.09e - 03 


0.09 


8.36e - 05 


2.53e - 04 


4.39e - 04 


8.12e - 04 


1.38e - 03 


1.76e - 03 


2.14e - 03 


2.89e - 03 


3.65e - 03 


4.78e - 03 


0.08 


2.36e - 05 


7.41e - 05 


1.29e - 04 


2.41e - 04 


4. lie - 04 


5.24e - 04 


6.38e - 04 


8.63e - 04 


1.08e - 03 


1.43e - 03 


0.07 


4.60e - 06 


1.49e - 05 


2.65e - 05 


5.01e - 05 


8.55e - 05 


1.09e - 04 


1.33e - 04 


l.Sle - 04 


2.28e - 04 


2.99e - 04 


0.05 


2.48e - 08 


8.91e - 08 


1.63e - 07 


3.16e - 07 


5.44e - 07 


6.99e - 07 


8.51e - 07 


1.16e - 00 


1.47e - 00 


1.92e - 00 


0.04 


2.57e - 10 


9.89e - 10 


1.85e - 09 


3.64e - 09 


6.35e - 09 


8.15e - 09 


l.Ole - 08 


1.36e - 08 


1.72e - 08 


2.26e - 08 


0.03 


1.25e - 13 


5.38e - 13 


1.03e - 12 


2.08e - 12 


3.68e - 12 


4.74e - 12 


5.80e - 12 


7.94e - 12 


l.Ole - 11 


1.32e - 11 



Table 14: Estimated probability values for different pairs (e, T) using the exponential mollifica- 
tion. Parameter choices M = ^§-^ with k = 0.4 and x = 0.4 
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e\T 


0.5 


1 


1.5 


2.5 


4 


5 


6 


8 


10 


13 


0.14 


3.7 


2.3 


1.8 


1.3 


1.1 


1.0 


1.0 


1.1 


1.3 


1.7 


0.12 


4.2 


2.6 


2.0 


1.5 


1.2 


1.1 


1.1 


1.1 


1.2 


1.5 


0.09 


5.1 


3.3 


2.6 


1.9 


1.5 


1.3 


1.2 


1.2 


1.2 


1.2 


0.08 


5.3 


3.7 


2.8 


2.1 


1.8 


1.4 


1.3 


1.2 


1.2 


1.2 


0.07 


5.5 


4.1 


3.2 


2.3 


1.8 


1.6 


1.5 


1.3 


1.2 


1.2 


0.05 


4.9 


5.4 


4.5 


3.2 


2.5 


2.2 


2.0 


1.7 


1.6 


1.3 


0.04 


3.2 


6.5 


5.4 


4.1 


3.2 


2.8 


2.6 


2.2 


2.0 


1.8 


0.03 


2.8 


8.0 


7.3 


5.6 


4.4 


3.9 


3.5 


3.0 


2.7 


2.4 



Table 15: Relative errors per sample for different pairs (e, T) using the exponential mollification. 
Parameter choices M = -^4^ with k = 0.4 and x = 0.4. 



e\T 


0.5 


1 


1.5 


2.5 


4 


5 


6 


8 


10 


13 


0.14 


2.7 


2.5 


2.0 


1.5 


1.2 


1.0 


1.0 


1.0 


1.0 


1.1 


0.12 


2.6 


2.9 


2.4 


1.8 


1.4 


1.2 


1.1 


1.1 


1.1 


1.1 


0.09 


2.1 


3.7 


3.1 


2.4 


1.9 


1.7 


1.6 


1.4 


1.3 


1.2 


0.08 


1.9 


4.1 


3.5 


2.7 


2.1 


1.8 


1.8 


1.5 


1.4 


1.3 


0.07 


1.9 


4.5 


3.9 


3.0 


2.4 


2.1 


1.9 


1.7 


1.5 


1.4 


0.05 


1.8 


5.5 


5.2 


4.2 


3.2 


2.9 


2.7 


2.3 


2.0 


1.9 


0.04 


1.7 


6.0 


6.4 


5.2 


4.1 


3.7 


3.4 


2.9 


2.7 


2.3 


0.03 


2.0 


5.9 


8.2 


6.9 


5.6 


4.9 


4.5 


3.9 


3.6 


3.1 



Table 16: Relative errors per sample for different pairs (e, T) using the exponential mollification. 
Parameter choices M = with k = 0.4 and x = 0.5. 

5.2 Analysis of the simulation scheme for the nonlinear problem 

In this subsection, we present the theoretical analysis of the simulation scheme for general one- 
dimensional non-linear dynamics and provide rigorous bounds on performance. As for the linear 
case, the analysis is valid for e > without degradation as T — > oo. The analysis and the 
theoretical bound for the performance of this scheme are completely analogous to the linear 
problem, modulo the additional error coming from the linearization of the dynamics in the 
neighborhood of the stable equilibrium point. 

To distinguish between the linear and the nonlinear problem we need to introduce some 
notation. For a function W € C 1,2 ([0,T] x M) we define the operator 

g £ [W](t,x) = W t (t,x) + U(x,DW(t,x)) + ^a 2 (x)D 2 W{t,x), W(x,p) = b(x)p-- \a{x)p\ 2 . 

In analogy to (|4.5p . for smooth functions W, U, we define 

G e [W,U](t,x) = e [W](t,x) - i \a(x) (DW(t,x) - DU(t,x))\ 2 . 
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e\T 


0.5 


1 


1.5 


2.5 


4 


5 


6 


8 


10 


13 


0.14 


1.5 


2.4 


2.6 


2.2 


1.8 


1.6 


1.5 


1.3 


1.2 


1.0 


0.12 


1.5 


2.3 


3.0 


2.7 


2.2 


2.0 


1.9 


1.6 


1.4 


1.3 


0.09 


1.6 


2.2 


3.9 


3.8 


3.2 


2.9 


2.7 


2.4 


2.1 


1.9 


0.08 


1.6 


2.1 


4.1 


4.2 


3.7 


3.3 


3.1 


2.7 


2.5 


2.1 


0.07 


1.8 


2.1 


4.2 


4.7 


4.0 


3.7 


3.4 


3.0 


2.7 


2.4 


0.05 


1.8 


2.1 


4.4 


6.0 


5.3 


4.8 


4.5 


3.9 


3.6 


3.2 


0.04 


1.9 


2.1 


4.8 


7.1 


6.3 


5.8 


5.5 


4.8 


4.3 


3.9 


0.03 


2.0 


2.1 


3.5 


8.7 


8.2 


7.5 


6.9 


6.2 


5.6 


4.9 



Table 17: Relative errors per sample for different pairs (e, T) using the exponential mollification. 
Parameter choices M = with k = 0.4 and x = 1. 



e\T 


0.5 


1 


1.5 


2.5 


4 


5 


6 


8 


10 


13 


0.14 


1.5 


5 


5 


4 


3 


3 


3 


2 


2 


2 


0.12 


1.5 


7 


7 


6 


5 


5 


4 


4 


3 


3 


0.09 


1.6 


14 


19 


17 


14 


13 


12 


11 


9 


8 


0.08 


1.6 


21 


30 


28 


23 


22 


20 


17 


15 


14 


0.07 


1.8 


31 


54 


51 


42 


39 


36 


31 


28 


25 


0.05 


1.8 


36 


276 


278 


214 


220 


185 


153 


148 


128 


0.04 


1.9 


5 


568 


577 


665 


616 


507 


400 


415 


374 


0.03 


2.0 


3 


302 


60 


190 


39 


1878 


1485 


96 


1562 



Table 18: Relative errors per sample for different pairs (e,T) using the exponential mollification. 
Parameter choices M = with k = 0.25 and x = 1. 

Moreover, setting c = —b (xo) > and a = <t(xq) we recall that 

g £ [W](t,x) = W t (t,x)+U(x, DW(t,x)) + ^a 2 D 2 W(t,x), M(x,p) = -cxp - ^ \ap\ 2 . 

The operators with bars correspond to the nonlinear problem, whereas the operators without 
bars give the corresponding first order approximations. We define an operator measuring the 
error from the approximation by 

R £ [W] {t,x) = G £ [W] (t, x) - Q £ [W] (t, x) (5.3) 
Moreover, since b(x) and <r(x) are C 1 (M), we can write for any i£K 

b(x) = b(x ) + b'(x )(x - x ) + Ri(x), 
a{x) = a(x ) + R 2 (x), 

where Ri(x)\x\~ 2 and it^^)!^ -1 are locally bounded. By assumption we have that b(xo) = 
and a 2 (x) > 0. 

As with the linear problem, the subsolution used for the analysis is based on the 5— exponential 
mollification (|5.ip reduced by the multiplicative factor (1 — rf). We recall that this differs from 
the subsolution used for the design, which has 7] = 0. 
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Next we proceed with the mathematical analysis of the scheme. The analysis is parallel to 
what was done for the linear problem, modulo adjustments due to the linearization of the dy- 
namics in the neighborhood of the stable point, and therefore we mainly focus on the differences. 
In order to simplify the notation we assume without loss of generality (as it was done in the 
linear problem) that the stable equilibrium is xq = 0. We write x+ = — £_ = x, and for nota- 
tional convenience assume that A 2 = — A\ = A. As in the linear problem, z = x(c/Ma 2 ) 1 ! 2 /2 
and H = lOz. 

The following lemma bounds the error from the approximation. 

Lemma 5.1 Consider (t, x) G [0, T — t*] x [0, z] . Then, for z < min{l, A} 

\R £ [Fi I ](t,x)\<2a M (t)(z + xe- c{t - T) ) sup |6(x)+cac| 

xe[o,z] 



+ 



sup \a 2 (x) — a 2 
xe[o,z] 

=(t-T) w 2 , I n M (±\( , * - C (t-T)^^ 2 „ , 



2 (a M {t)(z + xe- c (*- T ))) + ea M (t) 
< C [a M {t){z + x e- c{t - T) )z 2 + (a M (t)(z + xe 



z + a M {t)ez 



where 



In addition 



Co 



sup 

x£[0,A] 



^M + ^M| 2 , + ft (x) 



sup \F ± (x) - Fi(s)| < Ciz 3 , 
xe[o,g] 

where C\ is a fixed constant. 

Proof. Since by assumption 6(0) = and a 2 (x) > 0, for (t,x) G [0, T] x K 

R £ [W]{t, x) = M(x, DW(t, x)) - W(x, DW(t, x)) + |(a 2 (x) - a 2 )D 2 W{t, x) 
= Ri(x)DW(t,x) - - (\a(x)DW{t,x)\ 2 - \aDW{t,x)\ 2 

+ £ -(a 2 (x)-a 2 )D 2 W(t,x) 
= Ri(x)DW(t, x) - ^R 2 (x) (2a + R 2 (x)) \DW(t, x)\ 2 



+ -R 2 (x) (2a + R 2 (x)) D 2 W(t, x). 



For all x G [0, A], we have 



\Ri(x)\ < C\x\ 2 and |i2 2 (x)| < C\x\ 



(5.4) 



for a constant C that depends on A. The bound follows by setting W(t, x) = FJ? (t, x) and using 
DF^ (t,x) = 2a M (t)(x — xe~ c ^~ T ^), and the proof of the first part of the lemma is concluded. 
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The proof of the second part goes as follows. Observe that 

F^x) = 2L-S(0,x) 

f x b(y) 

Jo v 2 (y) 
= 2L + 2 L 



-cy + Ri(y) 



dy 



lo {a + R 2 {y)Y Jo a 2 (a + R 2 {y)) 2 



The last display and the bounds in ()5.4p imply that 

sup \Fi(x) - Fi(x)\ < Ciz 3 
xe[o,z] 

which concludes the proof of the lemma. ■ 

For notational convenience we identify the quantity appearing in the upper bound for 
|-R e [-Fr^](i, x)\ as given in Lemma 15.11 



r(e, x, M, t) = a M (t){z + xe'^'^z 2 + (a M (t)( 



z + xe 



-c(t-T)- 



z + a M (t)ez. 



(5.5) 



The following lemma shows that the error term induced by the local approximation of the 
dynamics in the neighborhood of the stable equilibrium point does not degrade as T gets large. 



Lemma 5.2 We have that 

rT-f 



r(e,x,M,t)dt = Ji(t*,T, M)z s + J 2 (t* , T, M)z x + J3 (t* , T, M)zx l + J 4 (f , T, M)ez 







where, letting K 



^ + a 2 



Jx(t*,T,M) 
J 2 (t*,T,M) 



1 

1 



(7- 



K - a 2 e~ 2cT _c_ 
g K - a 2 e~ 2ct * + 2a 4 



K 



K 



1-^ 



log 



+ 



2cr 4 



2a 2 e 



2^-ct* 



J,{t\T,M) = — A 



1 + -2_p~ ct * 
1 _ _2_ P -ct* 

2(7 2 e- cT " 
iif _ a 2 e~ 2cT 
a 2 

K - a 2 e~ 2ct * K - a 2 e~ 2cT 
K - a 2 e~ 2cT 



K - a 2 e- 2ct * 

cT 



K -a 2 e 



2 c -2cT 



log 



1 + VK e ~ 



1 



7T 



-cT 



.2 



O" 



/re particular, lim 



/ Q T * r(e, x, M, t)dt < 00. 



T— > 00 
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The proof of this lemma follows by straightforward integration of r(e, x, M, t). Moreover, we 
note that 

Ji(t*,T, M) = 0(1) as M ->■ oo for all i = 1, 2, 3, 4 
and that the definition of 2 = xic/Ma 2 ) 1 ! 2 /2 implies 

r(e, x, M, t)dt = O (z 3 + z 2 x + zx 2 + ez) , uniformly in T < oo, as M — > oo. 

Remark 5.3 The following three lemmas are analogous to Lemmas I4.3ti4.5l from the linear case. 
The important difference between the nonlinear and the linear case is that the statements involve 
approximation errors and the statements hold if one confines the linearized dynamics to a small 
neighborhood of the equilibrium point as dictated by the sizes of t* and M, or equivalently by 
t* and z. Due to the natural scaling of M in terms of e as indicated in Subsection 14.41 as e gets 
smaller, z will get smaller and be confined to a sufficiently small region that the statements of 
the lemmas are valid. However, the lemmas below are stated for z sufficiently small, without 
referencing to the natural scaling used in the simulation algorithm. 



Lemma 5.4 Assume that (t,x) G [0, T — t*] x [0, z\, 5 > e and r/ < 1/2. Then, for sufficiently 
small z we have up to an exponentially negligible term 

&[U 5 '\U 5 ]{t,x) > -(l- V )C r(e,x,M,t). 

Proof. As in the proof of Lemma 14.31 we need to show that in this region F\(x) — FJ^At, x) is 
bounded by below away from zero. We have 



F 1 (x)-F 2 R ^(t,x) = x 2 ^ 



2 -^\dy-a M (t) (x-xe-^ 2 



At x = 0, the value is minimized at t = T — t* and, as in Lemma 14.31 we obtain the lower 
bound x 2 c[l/o- 2 - I /[a 2 + c/M]] > 0. For x > we use the decomposition 



Fi(x) - F 2 A ^(t, x) = [F x (x) - F 2 M + (t, x)] + [Fi(x) - F^x)] . (5.6) 

Using the inequality e~ 2ct * < fec/Ma 2 ] 4 < c/Ma 2 , the definition of z and equation (]4.8p . 
we obtain that for all (t, x) £ [0, T - t*] x [0, z] 



Fl (x)-F 2 ^(t,x)>^[x 2 -z 2 ] 



> 



M 



ze 



-ct* 



:r 2 



c , 



Ma 2 + c/M 
3cr 2 - c/M 



ce 



X J 

-2ct* 



a 2 + c/M 



c/M 



c 9 
= 2^z 2 
o~ 



a 2 + c/M o 2 + c/M 
2a 2 + 3c/ 'M c/M 



a 2 + c/M a 2 + c/M 
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Hence, this term is of order z 2 . Moreover, by Lemma 15. 11 we also have that 

sup \F x {x) < C lZ 3 . 

xe[o,z] 

Thus, for sufficiently small z the first term on the right hand side of (|5.6p dominates the second 
term. Hence the difference F\(x) — F 2 4 + (t,x) is bounded from below away from zero if z is 
sufficiently small, and the term involving the weight p\ is exponentially negligible. 

We next use that /3 (t,x) > 0, J^(t) > and rj < 1/2, LemmaEH and also ([53)1 and ([53]) . 
These imply that up to an exponentially negligible term, 

g £ [U s «,U s ](t,x)>(l-r))G £ [F 2 M ](t,x) 

= (l-vh2 I (t) + (l-r l )R £ [F 2 M }(t,x) 
> -(l- V )C r(£,x,M,t), 

which concludes the proof of the lemma. ■ 

Lemma 5.5 Assume that (t,x) G [0,T — t*] x [H, A] and assume 5 > e. Define 

-ea 2 (x)D(b(x)a~ 2 (x)) 
xe[-A,-H]u[H,A] ~ea 2 {x)D [b{x)a~ 2 {x)) + 6 2 (x)cr- 2 (a;) 

Let e > be sufficiently small such that t]q(e) < 1/4 and consider r] G (770(e), 1/4). Then, for 
sufficiently small z and up to exponentially negligible terms, 

g*[U s «,U s ](t,x)>0. 

Proof. As with the linear problem we need to argue that there is a constant C2 > such that 
F$f + (t,x) - Fi(x) > c 2 > for all (t,x) G [0,T - t*] x [H,A], which will then imply that the 

terms involving p^ ,+ {t,x) are exponentially negligible. We have the decomposition 

F*f + (t,x) - F(x) = [F 2 M + (t,x) - F 1 (H)] + [F^H) - F(H)} + [F(H) - F(x)] 

Focusing on x = H, Lemma 14.41 guarantees that [F^'+ft, H) — Fi(H)] > uniformly in 
t € [0, T — £*]. Moreover, a straightforward computation shows that for H sufficiently small, 
the lower bound of [-F^_(t,-ff) — Fi(H)] is of the order H 2 or equivalently z 2 (since H = 10 z). 
By Lemma EU the second term [Fi(H) — Fi(H)j is of order H 3 . Thus, for sufficiently small 
H, or equivalently for sufficiently small z, the term F^it^H) — F\(H) is bounded by below 
away from zero. This statement, convexity of x i— > F^At^x) and the fact that the third term 
of the last display is nonnegative for all x £ [H, A] , imply that there is a c 2 > such that 
F 2 M + (t,x) - F^x) > c 2 for all (t,x) G [0,T — t*] x [H, A] . 

Hence, terms involving p 2 ' (t, x) are exponentially negligible. Since /3o(t,x) > 0, up to an 
exponentially negligible term 

g £ [U 5 ^,U 5 ](t,x) > (l- ?? ) /0l (i,x)g £ [F 1 ](^) + ^(??-2r 7 2 ) \a(x) Pl (t,x)DF 1 (x)\ 2 
= e(l - r))cr 2 {x)D (b(x)o-~ 2 (x)) + 277(1 - 2r])b 2 (x)o-~ 2 (x). 
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Thus for e small enough that 



Vo(z), ^ ) 



we have, up to an exponentially negligible term 

g e [u 5 «,u 5 ]{t,x) >o, 

concluding the proof of the lemma. ■ 

Lemma 5.6 Assume that (t,x) £ [0,T — t*] x [z,H] and that M > 5c/a 2 . Set 5 = 2e and 
a 2 = swpxe[-A,A] a2 { x )- Then, for sufficiently small z we have up to an exponentially negligible 
term 



g £ [u 5 >\u 5 ]{t,x)>^ 



1 / C 2 7] 



where 



C {l-n)r(e,x,M,t). 



2ec + T(t,z,H,x,e,r],T) 



AO 



r(t,z,H,x,e,ri,T)= inf £L (DF^x) - DF^x)) (z-xe c(t - T) 
xe[z,H] Vla A V 



(5.7) 



+lrj (DFtix) - DF^x)) 2 + 2e(^+D 



bjx) 
a 2 (x] 



and o~ 2 (x) > o~\ > for all x € 



Proof. As with the analysis of the linear problem, this region presents difficulties in its analysis, 
since the term F±(x) — F^+it, x) can be either positive or negative. This means that both p^ ,+ 

and p\ may be important. As with the linear problem, we ignore terms related to p^~ due to 
exponential negligibility. 

Up to an exponentially negligible term 



g s [u*><>,u s ](t,x) >(i - T,)±Po(t,x) + (i - v )p% r ' + (t,x)g £ [Fi? + }(t,x) + (i - r l )p 1 {t,x)g £ [F l }{x) 



i 



+ - (rj- 2r, 2 ) a 2 (x) pf '+ (t,x)DF^(t,x) + p 1 (t,x)DF l (a 

We distinguish two cases depending on whether p±(t,x) > 1/2 or pi(t,x) < 1/2. 

Case I: pi(t,x) > 1/2. We know that (3o(t,x) > and due to the positivity of 7|{, (i, x) we 



have by Lemma 15. II 

Q E [Fi! + \{t,x) = g £ [Fi'! + ](t,x) + R £ [F^ + ](t,x) > -C r(e,x,M,t). 
Next we need to argue that for (t,x) G [0,T — t*] x [z,H] 

DF&(t,x)(DFi(x) - DF 2 M + (t,x)) > 0. 



41 



March 5, 2013 



Lemma |4,5I implies that 



DF x {x) - DFif + (t,x) = [DF l (x) - DF x (x)\ + [DF l (x) - DF™ + (t,x)) 



< [DFi (x) — DF\ (x)] + 



2c 



K 



'a 2 K- a 2 e 2 < t ~ T ) \ \Ma 2 J 2 \Ma 2 



1/2 



(5.8) 



It follows easily from the proof of Lemma 15.11 that 



sup \DFx{x) - DF 1 (x) I < CxH 2 = lOOCiz 2 

xe[z,H] 

for some constant C\ which is independent of z. For the second term on the right hand side of 
(15.81) we have 



. 2c 
x 



K 



( 2c V 1 



a 2 K - a 2 e 2c (*- T ) { \Ma 2 J 2 
K 



Ma 2 



< x 

< z 

< 



2c 



1 



a 2 K — a 2 e 2 < t ~ T ) 2 V Ma 2 
2c K 



1/2 



C \3/2 



Ma 2 ) 



a 2 K - a 2 e 2c (*" T ) V 5 3 / 2 



where we used the definition z = x^J cj (Ma 2 ) /2 and the assumption 5c/ '(Ma 2 ) < 1. Thus, this 
term is strictly negative and of order z for z sufficiently small. 

Thus for z sufficiently small, the first term on the right hand side of (|5.8p is dominated by 
the second term which is negative. Together, with the non- positivity of DF^ + (t, x), we get that 
DF^f + (t,x)(DF 1 (x) - DF$f + (t,x)) > for z sufficiently small. 

Since r/ < 1/4, we obtain 



g £ [U 5 ^,U s ](t,x) > (1 - V ) Pl (t, x)G e [Fx] (x) + ^-tio*(x) iDF^x) - DF™(t,x 



M 



16 



(l-ri)Cor(e,x,M,t) 



> 



a 2 (x) 



K (DF x (x) - DFi%(t,x)f + 2eD 



b(x) 
a 2 (x) 



- (l-rj)C Q r(e,x,M,t). 
Case II: p\(t,x) < 1/2. Similarly to the linear problem we have 



g £ [U 5 ^,U 5 ](t,x) > (1 - V ) Pl (t,x)a 2 (x) 
Writing 



1 



^DF^x) - DF™ + (t,x)\ 2 + eD 
DFi(x) - DF^ + (t,x) = DFi(x) - DF x (x) + DF x (x) - DF^ + (t,x 



b(x) 
a^Jxj 
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and using, as in the linear problem, the estimate 



2c 



DFg + (t, x) - DFi(x) >^[z-xe 
we obtain as in the proof of Lemma 14.51 that 

g £ [u s «,u s ](t,x) 



c(t-T) 



> 



a 2 (x) 



1 



t] DFi(x) - DF^x) + 



2c 



z - xe^-^ 



+ 2eD 



b(x) 
a 2 {x) 



AO 



>°1 

~ 2 



C (l-rj)r{e,x,M,t) 



1 / c rj 



z — xe 



c(t-T) 



a 2 \2a 2 
C (l-r])r(e,x,M,t). 



2ec) +T(t,z,H,x,e,7],T) 



AO 



where T(t, z, H, x, e, rj, T) was defined in (|5.7p . This concludes the proof of the lemma. ■ 

The performance bound is then summarized in the following theorem. The proof of Theorem 
15.71 is the same as the proof of Theorem 14.61 for the linear case. So, it will not be repeated here. 

Theorem 5.7 Assume 5 = 2e, rj 6 {rjo{s), 1/4), z 2 cr\ > 8ea 2 and that M > 5c/a 2 , where 770(e) 
is as in Lemma \5.5[ Set a 2 = sup^g^^ ^j a 2 (x). Let u be the control based on the function U s 
defined via KM . i.e., u{t,x) = -a{x)DU s (t,x). Then, up to an exponentially negligible term, 
for e € (0, Eq) such that r?o( £ o) = 1/4 and for z sufficiently small, we have 



-elogQ e (0,0;u) > 2 



h(e,r,,T,t*,x,M)-(l-Ti)Co 



T-t* 



r(e,x,M,t)dt 



{T>t*} 



where 



+ 2L 2 {e,T)l {T<t , } , 



I 1 (e,rj,T,t\x,M) = {l-Tj)U 5 {U,V) 

r,2„ , ,2 



J 



ds — t*c*e, 



2ecj +T(s,z,H,x,e,rj,T) 
where J are the times in [0,T — t*] where the integrand is negative, T(s,z,H,x,e,n,T) is as in 
U 5 {0,0) > 



K -a 2 e 



2„-2cT 



X Z + ( 2L - AtX 2 



6 log 3 



and 



L 2 (e,T) =2L-c*Te, and c* = sup a 2 (x) \D(b(x)/a 2 (x))\ > 0. 



xe[-A,A] 



The bound of Theorem 15.71 takes a complicated form, but as in the linear case, the perfor- 
mance does not degrade as T — > 00. This was also reflected by the simulation data in Subsection 
15.11 Let us now justify this claim. 
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Notice that, by Lemma 15.21 the term t r(e, x, M,t)dt is finite, uniformly in T. Next we 
need to argue, similarly to the linear problem, that when T — s is sufficiently large and z is 
sufficiently small, the integrand of the second term in the definition of Ii(e,rj,T,t* ,x, M) is in 
fact positive. Let us denote the integrand of the second term by 



B(s, z, H, x, e, r/, T) + T(s, z, H, x, e, rj, T) 



where B(s, z, H, x, e, r), T) 



2rr 2 



xe 



c(s-T)> 



) — 2ecj. The term T(s, z, H,x,e,rj,T) is 



composed by three terms and we shall argue below they are dominated (even when they are 
negative), by the second term in the definition B(s,z,H,x,e,r],T), i.e., by 2ec/a 2 when z is 
small enough. This means, as in the case of the linear problem, that when the integral will 
be finite uniformly in T. Let us now support the claim just made. It is easy to see that for 
x 6 [z, Wz], the first term in the definition of V can be either positive or negative, but it is of 
order rjz s . The second term in the definition of T is positive and for x € [z, Wz], it is of order 
rjz 4 . Lastly, the third term in the definition of T, may be positive or negative, but in either 
cas! e, it will be of order ez for x S [z, 10z]. Therefore, for z sufficiently small, T is dominated 
by the second term in the definition B, i.e., by 2ec/a 2 . Hence, the argument that was used for 
the linear problem in order to show that the integrand of the second term in the definition of 
Ii(e, i], T, t* , x, M) is in fact positive when T — s is large enough, allows us to reach to the same 
conclusion here as well, given that z is chosen sufficiently small. 



6 Appendix A 

In this appendix we provide proofs of some auxiliary lemmas used in the main body of the 
manuscript. 

Proof of Lemma 14.11 Without loss of generality, we can restrict attention to n = 2. We have 
dtU s {t, x) = Pl (t, x; 8)dtUi(t, x) + p 2 (t, x; 5)d t U 2 (t, x) 
DU 5 {t,x) = p 1 (t,x;5)DU 1 (t,x) + p 2 {t,x;5)DU 2 {t,x) 



and 



D 2 U 5 {t,x) = -DU s (t,x) 2 - pi(t,x;S) 
o 



-DU x (t,x) 2 -L> 2 C/i(t,x) 
o 



p 2 (t,x;S) 



-DU 2 (t,xf - D 2 U 2 (t,x) 
o 
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Omitting function arguments for notational convenience, for e < 5 



d t U d + 



DU S b - I 



aDU e 



+ £ -aD 2 U S 



pidtUx + p 2 d t U 2 + pxD\] x b + p 2 DU 2 b - - 



+ 



+ 



£ 1 

25 

e 

2 L 



a [piDUx + p 2 DU 2 
piaD 2 Ui + p 2 aD 2 U 2 



2 e 1 
~ 25 



a ypiDUx + p 2 DU 2 
pMDUx) 2 + p 2 a(DU 2 ) 2 



; Pi 
+ p 2 



dtlJx + DUxb 



1 



d t U 2 + DU 2 b - - 



aDU 2 



+ -aD 2 lJx 
2 

+ -aD 2 U 2 



- 2 V «5 









2 




2 


) 


Pi 


aDUi 


+ P 2 


aDU 2 








_ 2 




_ 2 





Pi 



+ P2 



a [pxDUx + p 2 DU 2 

X 

p\oDXJ\ + p 2 oDU 2 



+ Pill + PT1-2, 



aDUi 

> Pill + P272, 

where the last line is due to the convexity of f(x) = x 2 . m 

Lemma 6.1 Let U(t,x) and W(t,x) be two continuously differentiable functions from [0,T] x 
K H- 1. Assume that b and a are Lipschitz continuous. Set u(t,x) = —o~(x)DU(t,x), v 6 A 
and let X £ (s) solve 

dX £ (s) = -b{X £ {s))ds + o-(X £ (s)) \y/idB(s) - [u(s, X £ (s)) - v(s)]ds\ , X £ (0) = y. 

Then for every £ > 0, v G A and stopping time f £ < T , we have, with probability 1, 
* 1 



-v(s) -u(s,X £ (s)Y 



> 2W(0, y) - 2W(f £ ,X £ (f £ )) + 2^1 DW(s, X £ (s))a(X £ (s))(i J B(s) 



+ 2 



g £ [W](s,X £ (s))ds 
Proof. We make use of the min/max representation 
M(x,p) = inf sup 



o-(X £ (s)) (DW(s,X £ (s)) - DU(s,X £ (s)) 



ds 



p(b(x) — a(x)u + a(x)v) — -u 2 + -v 2 
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Assume we use the control u(t, x) = —a(x)DU(t, x) for the design of the scheme and choose 
p = DW{t,x). Then 



inf 



1 1 

DW(t, x)(b(x) - a(x)u(x) + a(x)v) - -u(t, xf + -v 2 
DW(t, x){b{x) + a 2 (x)DU(t, x) - 2a 2 (x)DW{t, x) 



- - \a{x)DU{t,x)\ 2 + \a(x)DW(t,x)\ 2 
= DW(t,x)b(x) - - \a(x)DW(t,x)\ 2 - - \a(x) (DW(t,x) - DU(t,x))\ 2 
= M(x, DW(t, x)) - - \a{x) (DW(t, x) - DU(t, x))\ 2 . 
Applying Ito's formula to W(s,X £ (s)) then gives 



W(r £ ,X £ (f £ )) = W{0,y) + / d s W(s,X £ (s))+ 

Jo l 

+DW(s,X £ (s)) \b(X £ (s))-a(X £ (s))u(s,X £ (s))+a(X £ (s))v(s) 



ds 



-a 2 {X £ (s))D 2 W{s,X £ (s))ds+ / ^/eDW{s, X £ {s))a{X £ {s))dB{s) 
o 2 J 



> 



l -u{s,X £ {s)) 2 -\v{s) 2 



ds + ^/I DW{X £ {s))a{X £ (s))dB(s) 
Jo 



d s W(s, X £ {s)) + M(X £ (s), DW(s, X £ {s))) + -a 2 {X £ {s))D 2 W(s, X £ {s)) 



ds 



a(X £ (s)) DW(s, X £ (s)) - DU(s, X £ (s)) 



o 

l -u{X £ (s)f-\v(s) 2 



ds 



ds + y/I DW(s, X £ (s))a{X £ (s))dB(s) 
Jo 



1 



g £ [W](s,X £ (s))ds-- / a(X £ (s))(DW(s,X £ (s))-DU(s,X £ (s)) 
z Jo v 

Rearranging this expression concludes the proof of the lemma. ■ 



ds 
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